LNAI 2654 




Ute Schmid 



Inductive Synthesis 
of Functional Programs 



Universal Planning, Folding of Finite Programs, 
and Schema Abstraction by Analogical Reasoning 





Lecture Notes in Artificial Intelligence 

Edited by J. G. Carbonell and J. Siekmann 
Subseries of Lecture Notes in Computer Science 



2654 



Springer 

Berlin 
Heidelberg 
New York 
Hong Kong 
London 
Milan 
Paris 
Tokyo 



Ute Schmid 



Inductive Synthesis 
of Functional Programs 



Universal Planning, Folding of Finite Programs, 
and Schema Abstraction by Analogical Reasoning 




Springer 



Series Editors 

Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA 
Jbrg Siekmann, University of Saarland, Saarbriicken, Germany 

Author 
Ute Schmid 

University of Osnabriick 

Institute of Computer Science 

Department of Mathematics and Computer Science 

Albrechtstr. 28, 49069 Osnabriick, Germany 

E-mail: schmid@informatik.uni-osnabrueck.de 



Cataloging-in-Publication Data applied for 

A catalog record for this book is available from the Library of Congress. 

Bibliographic information published by Die Deutsche Bibliothek 

Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; 

detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>. 



CR Subject Classification (1998): 1.2.2, 1.2.3, 1.2.4, 1.2.8, D.1.2, F.3.1, F.4.1, D.2.1 1 



ISSN 0302-9743 

ISBN 3-540-40I74-I Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

Springer- Verlag Berlin Heidelberg New York 

a member of BertelsmannSpringer Science+Business Media GmbH 

http://www.springer.de 

© Springer-Verlag Berlin Heidelberg 2003 
Printed in Germany 

Typesetting: Camera-ready by author, data conversion by Boiler Mediendesign 
Coverillustration: "Nachtigall 11" by Heinrich Neuy 

Printed on acid-free paper SPIN: 10932261 06/3142 5 4 3 2 1 0 



For My Parents and 
In Memory of My Grandparents 



Foreword 



Analogical reasoning is ubiquitous, whether in everyday common sense rea- 
soning, in scientific discovery, or anywhere in between. Examples of analogi- 
cal reasoning range from scientific theory creation, such as Bohr’s planetary 
model of the atom, to problem solving where a teacher’s solution of an illus- 
trative example problem is used to guide the student in solving new, similar 
problems. Psychologists have studied how people reason analogically, though 
often severely simplifying the reasoning task in order to run controlled experi- 
ments. Artificial intelligence researchers, including this writer, have built com- 
putational models that exhibit various forms of analogical transfer, ranging 
from simple copy-and-modify processes to complex derivational-trace track- 
ing and rejustifying reasoning steps for new problems. The underlying issues 
are not simple. For instance, in drawing an analogy, what should be kept in- 
variant, what should be modified or mapped, and what should be discarded? 
At what level of reasoning is analogy most profitably applied ~ i.e., should 
the solution to a problem be transferred and modified, should the derivation 
of the solution be transferred instead, or should the underlying principles in- 
voked in the derivation be the primary transfer vehicle? How does analogical 
reasoning interact with classical deduction or with inductive reasoning? And 
how can a solution drawn analogically be formally verified or refuted, in the 
sense of formal proof checking? These and other key issues lie at the heart of 
analogical reasoning research. 

Artificial intelligence researchers and cognitive psychologists have ad- 
dressed subsets of the analogical reasoning challenge. However, until now 
there has not been a true marriage of the psychological and the computa- 
tional in the realm of analogical reasoning. Although both camps cite each 
other and mutually benefit from new results, Ute Schmid is the first to de- 
velop, implement, test, and evaluate analogical reasoning models in depth 
based directly on data from subjects performing that reasoning. 

This book is a very thorough and clear report of Dr. Schmid’s deep anal- 
ysis of inductive and analogical reasoning, combining key aspects of artificial 
intelligence and algorithms on the one hand, and cognitive psychology on the 
other. Compared with related work, the comprehensive nature of the analog- 
ical reasoning model is evident: The case-based reasoning (CBR) community 
focuses primarily on indexing and retrieving relevant past cases, rather than 



VIII Foreword 



deriving new solutions or solving significantly different problems. Earlier ana- 
logical reasoning work focused directly on problem solving - how to use past 
solutions for similar problems to help construct the solution to the new prob- 
lem. Veloso combined CBR and analogical reasoning, enabling large-scale 
problem solving from second-principles. Subsequently, analogical reasoning 
has seen new extensions such as Metis’s method for analogical construction 
of mathematical proofs and the use of analogy in intelligent tutoring systems. 
This book combines all the aspects of analogical reasoning, extends it to in- 
clude other forms of inductive and deductive reasoning, and directly ties the 
computational methods to psychological results. 



June 2003 



Jaime Carbonell 



Preface 



In this book a novel approach to inductive synthesis of recursive functions 
is proposed, combining universal planning, folding of finite programs, and 
schema abstraction by analogical reasoning. In a first step, an example do- 
main of small complexity is explored by universal planning. For example, for 
all possible lists over four fixed natural numbers, their optimal transformation 
sequences into the sorted list are calculated and represented as a DAG. In a 
second step, the plan is transformed into a finite program term. Plan transfor- 
mation mainly relies on inferring the data type underlying a given plan. In a 
third step, the finite program is folded into (a set of) recursive functions. Fold- 
ing is performed by syntactical pattern-matching and corresponds to inducing 
a special class of context-free tree grammars. It is shown that the approach 
can be successfully applied to learn domain-specific control rules. Control 
rule learning is important to gain the efficiency of domain-specific planners 
without the need to hand-code the domain-specific knowledge. Furthermore, 
an extension of planning based on a purely relational domain description 
language to function applications is presented. This extension makes plan- 
ning applicable to a larger class of problems that are of interest for program 
synthesis. 

As a last step, a hierarchy of program schemes (patterns) is generated 
by generalizing over already synthesized recursive functions. Generalization 
can be considered as the last step of problem solving by analogy or program- 
ming by analogy. Some psychological experiments were performed to inves- 
tigate which kind of structural relations between problems can be exploited 
by human problem-solvers. Anti-unification is presented as an approach to 
mapping and generalizing program structures. It is proposed that the inte- 
gration of planning, program synthesis, and analogical reasoning contributes 
to cognitive science research on skill acquisition by addressing the problem 
of extracting generalized rules from some initial experience. Such (control) 
rules represent domain-specific problem-solving strategies. 

All parts of the approach are implemented in Gommon Lisp. 

Acknowledgements. Research is a kind of work one can never do com- 
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Eyferth, Bernd Mahr, Jaime Garbonell, Arnold Upmeyer, Peter Pepper, and 



X 



Preface 



Gerhard Strube. I learned a lot from discussions with them, with colleagues, 
and students, such as Jochen Burghardt, Bruce Burns, the group of Hartmut 
Ehrig, Pierre Flener, Hector Geffner, Peter Geibel, Peter Gerjets, Jurgen 
Giesl, Wolfgang Grieskamp, Maritta Heisel, Ralf Herbrich, Laurie Hiyaku- 
moto, Petra Hofstedt, Rune Jensen, Emanuel Kitzelmann, Jana Koehler, 
Steffen Lange, Martin Miihlpfordt, Brigitte Pientka, Heike Pisch, Manuela 
Veloso, Ulrich Wagner, Bernhard Wolf, and Thomas Zeugmann (sorry to ev- 
eryone I forgot). I am very grateful for the time I could spend at Garnegie 
Mellon University. Thanks to Klaus Eyferth and Bernd Mahr who motivated 
me to go, to Gerhard Strube for his support, to Fritz Wysotzki who accepted 
my absence from teaching, and, of course, to Jaime Garbonell who was my 
very helpful host. My work profited much from the inspiration I got from 
talks, classes, and discussions, and from the very special atmosphere suggest- 
ing that everything is all right as long as “the heart is in the work.” I thank 
all my diploma students who supported the work reported in this book - 
Dirk Matzke, Rene Mercy, Martin Miihlpfordt, Marina Miiller, Mark Miiller, 
Heike Pisch, Knut Polkehn, Uwe Sinha, Imre Szabo, Janin Toussaint, Ulrich 
Wagner, Joachim Wirth, and Bernhard Wolf. Additional thanks to some of 
them and Peter Pollmanns for proof-reading parts of the draft of this book. I 
owe a lot to Fritz Wysotzki for giving me the chance to move from cognitive 
psychology to artificial intelligence, for many interesting discussions, and for 
critically reading and commenting on the draft of this book. Finally, thanks 
to my colleagues and friends Berry Glaus, Robin Hornig, Barbara Kaup, and 
Martin Kindsmiiller, to my family, and my husband Uwe Konerding for sup- 
port and high-quality leisure time, and to all authors of good crime novels. 



Table of Contents 



1. Introduction 1 



Part I. Planning 



2. State-Based Planning 13 

2.1 Standard Strips 13 

2.1.1 A Blocks- World Example 14 

2.1.2 Basic Definitions 14 

2.1.3 Backward Operator Application 18 

2.2 Extensions and Alternatives to Strips 20 

2.2.1 The Planning Domain Definition Language 20 

2.2.2 Situation Calculus 24 

2.3 Basic Planning Algorithms 27 

2.3.1 Informal Introduction of Basic Concepts 28 

2.3.2 Forward Planning 29 

2.3.3 Formal Properties of Planning 32 

2.3.4 Backward Planning 35 

2.4 Planning Systems 40 

2.4.1 Classical Approaches 40 

2.4.2 Current Approaches 42 

2.4.3 Complex Domains and Uncertain Environments 44 

2.4.4 Universal Planning 45 

2.4.5 Planning and Related Fields 48 

2.4.6 Planning Literature 50 

2.5 Automatic Knowledge Acquisition for Planning 51 

2.5.1 Pre-planning Analysis 51 

2.5.2 Planning and Learning 51 

3. Constructing Complete Sets of Optimal Plans 55 

3.1 Introduction to DPlan 55 

3.1.1 DPlan Planning Language 56 

3.1.2 DPlan Algorithm 57 



XII 



Table of Contents 



3.1.3 Efficiency Concerns 58 

3.1.4 Example Problems 59 

3.2 Optimal Full Universal Plans 64 

3.3 Termination, Soundness, Completeness 66 

3.3.1 Termination of DPlan 66 

3.3.2 Operator Restrictions 67 

3.3.3 Soundness and Completeness of DPlan 70 

4. Integrating Function Application in Planning 71 

4.1 Motivation 71 

4.2 Extending Strips to Function Applications 74 

4.3 Extensions of FPlan 79 

4.3.1 Backward Operator Application 79 

4.3.2 Introducing User-Defined Functions 81 

4.4 Examples 82 

4.4.1 Planning with Resource Variables 83 

4.4.2 Planning for Numerical Problems 85 

4.4.3 Functional Planning for Standard Problems 87 

4.4.4 Mixing ADD/DEL Effects and Updates 88 

4.4.5 Planning for Programming Problems 88 

4.4.6 Constraint Satisfaction and Planning 90 

5. Conclusions and Further Research 93 

5.1 Comparing DPlan with the State of the Art 93 

5.2 Extensions of DPlan 94 

5.3 Universal Planning versus Incremental Exploration 95 



Part II. Inductive Program Synthesis 



6. Automatic Programming 99 

6.1 Overview of Automatic Programming Research 100 

6.1.1 AI and Software Engineering 100 

6.1.2 Approaches to Program Synthesis 102 

6.1.3 Pointers to Literature 109 

6.2 Deductive Approaches 110 

6.2.1 Constructive Theorem Proving 110 

6.2.2 Program Transformation 115 

6.3 Inductive Approaches 124 

6.3.1 Foundations of Induction 124 

6.3.2 Genetic Programming 134 

6.3.3 Inductive Logic Programming 140 

6.3.4 Inductive Functional Programming 150 



Table of Contents XIII 



6.4 Final Comments 164 

6.4.1 Inductive versus Deductive Synthesis 164 

6.4.2 Inductive Functional versus Logic Programming 165 

7. Folding of Finite Program Terms 167 

7.1 Terminology and Basic Concepts 168 

7.1.1 Terms and Term Rewriting 168 

7.1.2 Patterns and Anti-unification 171 

7.1.3 Recursive Program Schemes 172 

7.2 Synthesis of RPSs from Initial Programs 182 

7.2.1 Folding and Fixpoint Semantics 182 

7.2.2 Characteristics of RPSs 182 

7.2.3 The Synthesis Problem 185 

7.3 Solving the Synthesis Problem 185 

7.3.1 Constructing Segmentations 186 

7.3.2 Constructing a Program Body 195 

7.3.3 Dealing with Further Subprograms 198 

7.3.4 Finding Parameter Substitutions 206 

7.3.5 Constructing an RPS 215 

7.4 Example Problems 220 

7.4.1 Time Effort of Folding 220 

7.4.2 Recursive Control Rules 222 

8. Transforming Plans into Finite Programs 227 

8.1 Overview of Plan Transformation 228 

8.1.1 Universal Plans 228 

8.1.2 Introducing Data Types and Situation Variables 228 

8.1.3 Components of Plan Transformation 229 

8.1.4 Plans as Programs 229 

8.1.5 Completeness and Correctness 231 

8.2 Transformation and Type Inference 231 

8.2.1 Plan Decomposition 231 

8.2.2 Data Type Inference 233 

8.2.3 Introducing Situation Variables 234 

8.3 Plans over Sequences of Objects 235 

8.4 Plans over Sets of Objects 240 

8.5 Plans over Lists of Objects 246 

8.5.1 Structural and Semantic List Problems 246 

8.5.2 Synthesizing ‘Selection-Sort’ 248 

8.5.3 Concluding Remarks on List Problems 257 

8.6 Plans over Complex Data Types 259 

8.6.1 Variants of Complex Finite Programs 259 

8.6.2 The ‘Tower’ Domain 260 

8.6.3 Tower of Hanoi 267 



XIV Table of Contents 



9. Conclusions and Further Research 271 

9.1 Combining Planning and Program Synthesis 271 

9.2 Acquisition of Problem Solving Strategies 272 

9.2.1 Learning in Problem Solving and Planning 272 

9.2.2 Three Levels of Learning 273 



Part III. Schema Abstraction 



10. Analogical Reasoning and Generalization 279 

10.1 Analogical and Case-Based Reasoning 279 

10.1.1 Characteristics of Analogy 279 

10.1.2 Sub-processes of Analogical Reasoning 281 

10.1.3 Transformational versus Derivational Analogy 282 

10.1.4 Quantitive and Qualitative Similarity 283 

10.2 Mapping Simple Relations or Complex Structures 284 

10.2.1 Proportional Analogies 284 

10.2.2 Causal Analogies 286 

10.2.3 Problem Solving and Planning by Analogy 286 

10.3 Programming by Analogy 288 

10.4 Pointers to Literature 290 

11. Structural Similarity in Analogical Transfer 291 

11.1 Analogical Problem Solving 291 

11.1.1 Mapping and Transfer 292 

11.1.2 Transfer of Non- isomorphic Source Problems 293 

11.1.3 Structural Representation of Problems 294 

11.1.4 Non-isomorphic Variants in a Water Redistribution 

Domain 296 

11.1.5 Measurement of Structural Overlap 300 

11.2 Experiment 1 300 

11.2.1 Method 302 

11.2.2 Results and Discussion 303 

11.3 Experiment 2 305 

11.3.1 Method 306 

11.3.2 Results and Discussion 307 

11.4 General Discussion 309 

12. Programming by Analogy 311 

12.1 Program Reuse and Program Schemes 311 

12.2 Restricted 2nd-order AntDunification 312 

12.2.1 Recursive Program Schemes Revisited 312 

12.2.2 Anti-unification of Program Terms 314 



Table of Contents XV 



12.3 Retrieval Using Term Subsumption 316 

12.3.1 Term Subsumption 316 

12.3.2 Empirical Evaluation 317 

12.3.3 Retrieval from Hierarchical Memory 318 

12.4 Generalizing Program Schemes 319 

12.5 Adaptation of Program Schemes 320 

13. Conclusions and Further Research 323 

13.1 Learning and Applying Abstract Schemes 323 

13.2 A Framework for Learning from Problem Solving 324 

13.3 Application Perspective 325 

Bibliography 327 



Appendices 



A. Implementation Details 343 

A.l Short History of DPlan 343 

A.2 Modules of DPlan 345 

A. 3 DPlan Specifications 345 

A. 4 Development of Folding Algorithms 347 

A.5 Modules of TFold 348 

A. 6 Time Effort of Folding 349 

A. 7 Main Components of Plan- Transformation 349 

A. 8 Plan Decomposition 350 

A. 9 Introduction of Situation Variables 351 

A. 10 Number of MSTs in a DAG 351 

A. 11 Extracting Minimal Spanning Trees from a DAG 352 

A. 12 Regularizing a Tree 353 

A. 13 Programming by Analogy Algorithms 355 

B. Concepts and Proofs 357 

B. l Fixpoint Semantics 357 

B.2 Proof: Maximal Subprogram Body 360 

B. 3 Proof: Uniqueness of Substitutions 365 

C. Sample Programs and Problems 369 

G.l Fibonacci with Sequence Referencing Function 369 

G.2 Inducing ‘Reverse’ with Golem 370 

C. 3 Finite Program for ‘Unstack’ 373 

C.4 Recursive Control Rules for the ‘Rocket’ Domain 375 

C.5 The ‘Selection Sort’ Domain 376 

C.6 Recursive Control Rules for the ‘Tower’ Domain 377 



XVI Table of Contents 



C.7 Water Jug Problems 383 

C.8 Example RPSs 388 

Index 391 



List of Figures 



1.1 Analogical Problem Solving and Learning 5 

1.2 Main Components of the Synthesis System 6 

2.1 A Simple Blocks- World 14 

2.2 A Strips Planning Problem in the Blocks- World 18 

2.3 An Alternative Representation of the Blocks- World 19 

2.4 Representation of a Blocks- World Problem in PDDL-Strips 21 

2.5 Blocks- World Domain with Equality Constraints and Conditioned 

Effects 22 

2.6 Representation of a Blocks- World Problem in Situation Calculus . 26 

2.7 A Forward Search Tree for Blocks- World 31 

2.8 A Backward Search Tree for Blocks- World 36 

2.9 Goal- Regression for Blocks- World 37 

2.10 The Sussman Anomaly 39 

2.11 Part of a Planning Graph as Constructed by Graphplan 43 

2.12 Representation of the Boolean Formula f{xi,X 2 ) = xiAx 2 as OBDD 47 

3.1 The Clearblock DPlan Problem 60 

3.2 DPlan Plan for Clearblock 60 

3.3 Clearblock with a Set of Goal States 60 

3.4 The DPlan Rocket Problem 61 

3.5 Universal Plan for Rocfcei 62 

3.6 The DPlan Sorting Problem 62 

3.7 Universal Plan for Sorting 63 

3.8 The DPlan Hanoi Problem 63 

3.9 Universal Plan for Hanoi 64 

3.10 Universal Plan for Tower 64 

3.11 Minimal Spanning Tree for Rocket 67 

4.1 Tower of Hanoi (a) Without and (b) With Function Application . 73 

4.2 Tower of Hanoi in Functional Strips (Geffner, 1999) 74 

4.3 A Plan for Tower of Hanoi 80 

4.4 Tower of Hanoi with User-Defined Functions 82 

4.5 A Problem Specification for the Airplane Domain 83 

4.6 Specification of the Inverse Operator fly~^ for the Airplane Domain 84 



XVIII List of Figures 



4.7 A Problem Specification for the Water Jug Domain 86 

4.8 A Plan for the Water Jug Problem 86 

4.9 Blocks- World Operators with Indirect Reference and Update 89 

4.10 Specification of Selection Sort 89 

4.11 Lightmeal in Constraint Prolog 91 

4.12 Problem Specification for Lig/itmeaZ 91 

6.1 Programs Represent Concepts and Skills 125 

6.2 Construction of a Simple Arithmetic Function (a) and an Even-2- 

Parity Function (b) Represented as a Labeled Tree with Ordered 
Branches (Koza, 1992, figs. 6.1, 6.2) 135 

6.3 A Possible Initial State, an Intermediate State, and the Goal State 

for Block Stacking (Koza, 1992, figs. 18.1, 18.2) 137 

6.4 Resulting Programs for the Block Stacking Problem (Koza, 1992, 

chap. 18.1) 139 

6.5 0-Subsumption Equivalence and Reduced Clauses 142 

6.6 0-Subsumption Lattice 143 

6.7 An Inverse Linear Derivation Tree (Lavrac and Dzeroski, 1994, 

pp. 46) 144 

6.8 Part of a Refinement Graph (Lavrac and Dzeroski, 1994, p. 56) . . 146 

6.9 Specifying Modes and Types for Predicates 147 

6.10 Learning Function wnpacfc from Examples 152 

6.11 Traces for the unpack Example 153 

6.12 Result of the First Synthesis Step for unpack 155 

6.13 Recurrence Relation for unpack 155 

6.14 Traces for the reverse Problem 159 

6.15 Synthesis of a Regular Lisp Program 162 

6.16 Recursion Formation with Tinker 164 

7.1 Example Term with Exemplaric Positions of Sub-terms 170 

7.2 Example First Order Pattern 172 

7.3 Anti-Unification of Two Terms 173 

7.4 Examples for Terms Belonging to the Language of an RPS and of 

a Subprogram of an RPS 176 

7.5 Unfolding Positions in the Third Unfolding of Fibonacci 179 

7.6 Valid Recurrent Segmentation of Mod 187 

7.7 Initial Program for ModList 190 

7.8 Identifying Two Recursive Subprograms in the Initial Program for 

ModList 200 

7.9 Inferring a Sub-Program Scheme for ModList 201 

7.10 The Reduced Initial Tree of ModList 202 

7.11 Substitutions for Mod 207 

7.12 Steps for Calculating a Subprogram 216 

7.13 Overview of Inducing an RPS 218 

7.14 Time Effort for Unfolding/Folding Factorial 220 



List of Figures 



XIX 



7.15 Time Effort for Unfolding/Folding Fibonacci 221 

7.16 Time Effort Calculating Valid Recurrent Segmentations and Sub- 
stitutions for Factorial 221 

7.17 Initial Tree for Clearblock 223 

7.18 Initial Tree for Tower of Hanoi 224 

8.1 Induction of Recursive Functions from Plans 227 

8.2 Examples of Uniform Sub-Plans 232 

8.3 Uniform Plans as Subgraphs 233 

8.4 Generating the S’ltccessor-Function for a Sequence 236 

8.5 The Unstack Domain and Plan 237 

8.6 Protocol for Unstack 238 

8.7 Introduction of Data Type Sequence in Unstack 238 

8.8 LISP-Program for Unstack 239 

8.9 Partial Order of Set 240 

8.10 Functions Inferred/Provided for Set 242 

8.11 Sub-Plans of Rocket 244 

8.12 Introduction of the Data Type Set (a) and Resulting Finite Pro- 

gram (b) for the Unload-All Sub-Plan of Rocket (12 denotes “un- 
defined”) 244 

8.13 Protocol of Transforming the Rocket Plan 245 

8.14 Partial Order (a) and Total Order (b) of Flat Lists over Numbers. 247 

8.15 A Minimal Spanning Tree Extracted from the SelSort Plan 252 

8.16 The Regularized Tree for SelSort 254 

8.17 Introduction of a “Semantic” Selector Function in the Regularized 

Tree 256 

8.18 LISP-Program for SelSort 258 

8.19 Abstract Form of the Universal Plan for the Four-Block Tower . . . 264 

9.1 Three Levels of Generalization 274 

10.1 Mapping of Base and Target Domain 281 

10.2 Example for a Geometric-Analogy Problem (Evans, 1968, p. 333) . 285 

10.3 Context Dependent Descriptions in Proportional Analogy (O’Hara, 

1992) 285 

10.4 The Rutherford Analogy (Centner, 1983) 286 

10.5 Base and Target Specification (Dershowitz, 1986) 288 

11.1 Types and degrees of structural overlap between source and target 

Problems 295 

11.2 A water redistribution problem 297 

11.3 Graphs for the equations 2-a;-|-5 = 9 (a) and 3-x-|-(6 — 2) = 16 (b)301 

12.1 Adaptation of Sub to Add 321 



XX 



List of Figures 



C.l Universal Plan for Sorting Lists with Three Elements 376 

C.2 Minimal Spanning Trees for Sorting Lists with Three Elements . . . 377 

C.3 Minimal Spanning Trees for Sorting Lists with Three Elements . . . 378 

C.4 Minimal Spanning Trees for Sorting Lists with Three Elements . . . 378 



List of Tables 



2.1 Informal Description of Forward Planning 29 

2.2 A Simple Forward Planner 31 

2.3 Number of States in the Blocks- World Domain 33 

2.4 Planning as Model Checking Algorithm (Giunchiglia, 1999, fig. 4) 47 

3.1 Abstract DPlan Algorithm 58 

4.1 Database with Distances between Airports 84 

4.2 Performance of FPlan: Tower of Hanoi 88 

4.3 Performance of FPlan: Selection Sort 90 

6.1 Different Specifications for Last 103 

6.2 Training Examples 127 

6.3 Background Knowledge 127 

6.4 Fundamental Results of Language Learnability (Gold, 1967, tab. 1)132 

6.5 Genetic Programming Algorithm (Koza, 1992, p. 77) 136 

6.6 Calculation the Fibonacci Sequence 139 

6.7 Learning the daughter Relation 141 

6.8 Calculating an rlgg 143 

6.9 Simplified MIS-Algorithm (Lavrac and Dzeroski, 1994, pp.54) .... 145 

6.10 Background Knowledge and Examples for Learning reverse(X, Y) . 149 

6.11 Constructing Traces from I/O Examples 153 

6.12 Calculating the Form of an S-Expression 154 

6.13 Constructing a Regular Lisp Program by Function Merging 161 

7.1 A Sample of Function Symbols 169 

7.2 Example for an RPS 174 

7.3 Recursion Points and Substitution Terms for the Fibonacci Fimctionl78 

7.4 Unfolding Positions and Unfolding Indices for Fibonacci 180 

7.5 Example of Extrapolating an RPS from an Initial Program 183 

7.6 Calculation of the Next Position on the Right 195 

7.7 Factorial and Its Third Unfolding with Instantiation succ{succ{0)) 

(a) and pred{3) (b) 196 

7.8 Segments of the Third Unfolding of Factorial for Instantiation 

succ(succ(0)) (a) and pred(3) (b) 196 



XXII 



List of Tables 



7.9 Anti-Unification for Incomplete Segments 198 

7.10 Variants of Substitutions in Recursive Calls 207 

7.11 Testing whether Substitutions are Uniquely Determined 209 

7.12 Testing whether a Substitution is Recurrent 210 

7.13 Testing the Existence of Sufficiently Many Instances for a Variable 211 

7.14 Determining Hidden Variables 213 

7.15 Calculating Substitution Terms of a Variable in a Recursive Call . 214 

7.16 Equality of Subprograms 219 

7.17 RPS for Factorial with Constant Expression in Main 222 

8.1 Introducing Sequence 235 

8.2 Linear Recursive Functions 239 

8.3 Introducing Set 241 

8.4 Structural Functions over Lists 247 

8.5 Introducing List 248 

8.6 Dealing with Semantic Information in Lists 249 

8.7 Functional Variants for Selection- Sort 250 

8.8 Extract an MST from a DAG 253 

8.9 Regularization of a Tree 253 

8.10 Structural Complex Recursive Functions 260 

8.11 Transformation Sequences for Leaf-Nodes of the Tower Plan for 

Four Blocks 265 

8.12 Power-Set of a List, Set of Lists 266 

8.13 Control Rules for Tower Inferred by Decision List Learning 267 

8.14 A Tower of Hanoi TrogTam 268 

8.15 A Tower of Hanoi Program for Arbitrary Starting Constellations . 269 

10.1 Kinds of Predicates Mapped in Different Types of Domain Com- 
parison (Centner, 1983, Tab. 1, extended) 280 

10.2 Word Algebra Problems (Reed et ah, 1990) 287 

11.1 Relevant information for solving the source problem 299 

11.2 Results of Experiment 1 304 

11.3 Results of Experiment 2 308 

12.1 A Simple Anti-Unification Algorithm 315 

12.2 An Algorithm for Retrieval of RPSs 317 

12.3 Results of the similarity rating study 318 

12.4 Example Generalizations 320 



1. Introduction 



She had written what she felt herself called upon to write; and, though she 
was beginning to feel that she might perhaps do this thing better, she had no 
doubt that the thing itself was the right thing for her. 

— Harriet Vane in: Dorothy L. Sayers, Gaudy Night, 1935 



Automatic program synthesis is an active area of research since the early sev- 
enties. The application goal of program synthesis is to support human pro- 
grammers in developing correct and efficient program code and in reasoning 
about programs. Program synthesis is the core of knowledge based software 
engineering. The second goal of program synthesis research is to gain more 
insight in the knowledge and strategies underlying the process of code gener- 
ation. Developing algorithms for automatic program construction is therefore 
an area of artificial intelligence (AI) research and human programmers are 
studied in cognitive science research. 

There are two main approaches to program synthesis - deduction and 
induction. Deductive program synthesis addresses the problem of deriving 
executable programs from high-level specifications. Typically, the employed 
transformation or inference rules guarantee that the resulting program is 
correct with respect to the given specification - but of course, there is no 
guarantee that the specification is valid with respect to the informal idea a 
programmer has about what the program should do. The challenge for de- 
ductive program synthesis is to provide formalized knowledge which allows to 
synthesize as large a class of programs with as less user-guidance as possible. 
That is, a synthesis system can be seen as an expert system incorporating gen- 
eral knowledge about algorithms, data structures, optimization techniques, 
as well as knowledge about the specific programming domain. 

Inductive program synthesis investigates program construction from in- 
complete information, namely from examples for the desired input/output 
behavior of the program. Program behavior can be specified on different level 
of detail: as pairs of input and output values, as a selection of computational 
traces, or as an ordered set of generic traces, abstracting from specific input 
values. For induction from examples (not to be confused with mathematical 
proofs by induction) it is not possible to give a notion of correctness. The 
resulting program has to cover all given examples correctly, but the user has 
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to judge whether the generalized program corresponds to his/her intention.^ 
Because correctness of code is crucial for software development, knowledge 
based software engineering relies on deductive approaches to program syn- 
thesis. The challenge for inductive program synthesis is to provide learning 
algorithms which can generalize as large a class of programs with as less back- 
ground knowledge as possible. That is, inductive synthesis models the ability 
of extracting structure in the form of - possibly recursive - rules from some 
initial experience. 

This book focusses on inductive program synthesis, and more specially on 
the induction of recursive functions. There are different approaches for learn- 
ing recursive programs from examples. The oldest approach is to synthesize 
functional (Lisp) programs. Functional synthesis is realized by a two-step 
process: In a first step, input/output examples are rewritten into finite terms 
which are integrated into a finite program. In a second step, the finite program 
is folded into a recursive function, that is, a program generalizing over finite 
program is induced. The second step is also called “generalization-to-n” and 
corresponds to program synthesis from traces or programming by demonstra- 
tion. It will be shown later that folding of finite terms can be described as a 
grammar inference problem, namely as induction of a special class of context- 
free tree grammars. Since the late eighties, there have been two additional 
approaches to inductive synthesis - inductive logic programming and genetic 
programming. While the classical functional approach depends on exploiting 
structural information given in the examples, these approaches mainly de- 
pend on search in hypotheses space, that is, the space of syntactically correct 
programs of a given programming language. The work presented here is in 
the context of functional program synthesis. 

While folding of finite programs into recursive functions can be performed 
(nearly) by purely syntactical pattern matching, generation of finite terms 
from input/output examples is knowledge-dependent. The result of rewrit- 
ing examples - that is, the form and complexity of the finite program - is 
completely dependent on the set of predefined primitive functions and data 
structures provided for the rewrite-system. Additionally, the outcome de- 
pends on the used rewrite-strategy - even for a constant set of background 
knowledge rewriting can result in different programs. Theoretically, there are 
infinitely many possible ways to represent a finite program which describes 
how input examples can be transformed in the desired output. Because gen- 
eralizability depends on the form of the finite program, this first step is the 
bottleneck of program synthesis. Here program synthesis is confronted with 
the crucial problem of AI and cognitive science - problem solving success is 
determined by the constructed representation. 

^ Please note, that throughout the book I will mostly omit to give both masculine 
and feminine form and only use masculine for better readability. Also, I will refer 
to my work in first person plural instead of singular because part of the work 
was done in collaboration with other people and I do not want to switch between 
first and third person. 
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Overview In the following, we give a short overview about the different 
aspects of inductive synthesis of recursive functions which are covered in this 
book. 

Universal Planning. We propose to use domain-independent planning to gen- 
erate finite programs. Inputs correspond to problem states, outputs to states 
fulfilling the desired goals, and transformation from input to output is real- 
ized by calculating optimal action sequences (shortest sequences of function 
applications). We use universal planning, that is, we consider all possible 
states of a given finite problem in a single plan. While a “standard” plan rep- 
resents an (optimal) sequences of actions for transforming one specific initial 
state into a state fulfilling the given goals, a universal plan represents the set 
of all optimal plans as a DAG (directed acyclic graph). For example, a plan 
for sorting lists (or more precisely arrays) of four numbers 1, 2, 3, and 4, 
contains all sequences of swap operations to transform each of the 4! possible 
input lists into a list [1 2 3 4]- Plan construction is realized by a non-linear, 
state-based, backward algorithm. To make planning applicable to a larger 
class of problems which are of interest for program synthesis, we present an 
extension of planning from purely relational domain descriptions to function 
application. 

Plan Transformation. The universal plan already represents the structure 
of the searched-for program, because it gives an ordering of operator ap- 
plications. Nevertheless, the plan cannot be generalized directly, but some 
transformation steps are needed to generate a finite program which can be 
input to a generalization-to-n algorithm. For a program to be generalizable 
into a (terminating) recursive function, the input states have to be classi- 
fied with respect to their “size”, that is, the objects involved in the planning 
problem must have an underlying order. We propose a method by which the 
data type underlying a given plan can be inferred. This information can be 
exploited in plan transformation to introduce a “constructive” representation 
of input states, together with an “empty”-test and selector functions. 
Folding of Finite Programs. Folding of a finite program into a (set of) recur- 
sive programs is done by pattern-matching. For inductive generalization, the 
notion of fix-point semantics can be “inverted” : A given finite program is con- 
sidered as n-th unfolding of an unknown recursive program. The term can be 
folded into a recursion, if it can be decomposed such that each segment of the 
term matches the same sub-term (called skeleton) and if the instantiations of 
the skeleton with respect to each segment can be described by a unique substi- 
tution (set of replacements of variables in the skeleton by terms) . Identifying 
the skeleton corresponds to learning a regular tree-grammar, identifying the 
substitution corresponds to an extension of the regular to a context-free tree 
grammar. 

Our approach is independent of a specific programming language: terms 
are supposed to be elements of a term algebra and the inferred recursive 
program is represented as a recursive program scheme (RPS) over this term 
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algebra. An RPS is defined as a “main program” (ground term) together with 
a system of possibly recursive equations over a term algebra consisting of a 
finite sets of variables, function symbols, and function variables (representing 
names of user-defined functions) . Folding results in the inference of a program 
scheme because the elements of the term algebra (namely the function sym- 
bols) can be arbitrarily interpreted. Currently, we assume a fixed interpreter 
function which maps RPSs into Lisp functions with known denotational and 
operational semantics. We restrict Lisp to its functional core disregarding 
global variables, variable assignments, loops, and other non-functional ele- 
ments of the language. As a consequence, an RPS corresponds to a concrete 
functional program. 

For an arbitrarily instantiated finite term which is input to the folder, a 
term algebra together with the valuation of the identified parameters of the 
main program is inferred as part of the folding process. The current system 
can deal with a variety of (syntactic) recursive structures - tail recursion, lin- 
ear recursion, tree recursion and combinations thereof - and it can deal with 
recursion over interdependent parameters. Furthermore, a constant initial 
segment of a term can be identified and used to construct the main program 
and it is possible to identify sub-patterns distributed over the term which can 
be folded separately - resulting in an RPS consisting of two or more recursive 
equations. Mutual recursion and non-primitive recursion are out of the scope 
of the current system. 

Scheme Abstraction and Analogical Problem Solving. A set of (synthesized) 
RPSs can be organized into an abstraction hierarchy of schemes by generaliz- 
ing over their common structure. For example, an RPS for calculating the fac- 
torial of a natural number and an RPS for calculating the sum over a natural 
number can be generalized to an RPS representing a simple linear recursion 
over natural numbers. This scheme can be generalized further when regarding 
simple linear recursive functions over lists. Abstraction is realized by intro- 
ducing function variables which can be instantiated by a (restricted) set of 
primitive function symbols. Abstraction is based on “mapping” two terms 
such that their common structure is preserved. We present two approaches 
to mapping - tree transformation and anti-unification. Tree transformation is 
based on transforming one term (tree) into another by substituting, inserting, 
and deleting nodes. The performed transformations define the mapping re- 
lation. Anti-unification is based on constructing an abstract term containing 
first and second order variables (i. e., object and function variables) together 
with a set of substitutions such that instantiating the abstract term results 
in the original terms. 

Scheme abstraction can be integrated into a general model of analogi- 
cal problem solving (see fig. 1.1): For a given finite program representing 
some initial experience with a problem, that RPS is retrieved from mem- 
ory for which its n-th unfolding results in a “maximal similarity” to the 
current (target) problem. Instead of inducing a general solution strategy by 
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generalization-to-n, the retrieved RPS is modified with respect to the map- 
ping obtained between its unfolding and the target. Modification can involve 
a simple re-instantiation of primitive symbols or more complex adaptations. 
If an already abstracted scheme is adapted to the current problem, we speak 
of refinement. To obtain some insight in the pragmatics of analogical prob- 
lem solving, we empirically investigated what type and degree of structural 
mapping between two problem structures must exist for successful analogical 
transfer in human problem solvers. 
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Fig. 1.1. Analogical Problem Solving and Learning 



Integrating Planning, Program Synthesis, and Analogical Reasoning. The 
three components of our approach - planning, folding of finite terms, and 
analogical problem solving ~ can be used in isolation or integrated into a 
complex system. Input in the universal planner is a problem specification 
represented in an extended version of the standard Strips language, output 
is a universal plan, represented as DAG. Input in the synthesis system is a 
finite program term, output is a recursive program scheme. Finite terms can 
be obtained in arbitrary ways, for example, by hand-coding or by recording 
program traces. We propose an approach of plan transformation (from a DAG 
to a finite program term) to combine planning with program synthesis. Gen- 
erating finite programs by planning is especially suitable for domains, where 
inputs can be represented as relational objects (sets of literals) and where 
operations can be described as manipulation of such objects (changing liter- 
als). This is true for blocks- world problems and puzzles (as Tower of Hanoi) 
and for a variety of list-manipulating problems (as sorting). For numerical 
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problems - such as factorial or fibonacci - we omitt planning and start pro- 
gram synthesis from finite programs generated from hand-coded traces. Input 
in the analogy module is a finite program term, outputs are the RPS which 
is most similar to the term, its re-instantiation or adaptation to cover the 
current term, and an abstracted scheme, generalizing over the current and 
the re-used RPS. Figure 1.2 gives an overview of the described components 
and their interactions. All components are implemented in Common Lisp. 



Problem 

Specification 




Fig. 1.2. Main Components of the Synthesis System 



Contributions to Research In the following, we discuss our work in rela- 
tion to different areas of research. Our work directly contributs to research 
in program synthesis and planning. To other research areas, such as software 
engineering and discovery learning, our work does not directly contribute, 
but might offer some new perspectives. 

A Novel Approach to Inductive Program Synthesis. Going back to the roots 
of early work in functional program synthesis and extending it, exploiting the 
experience of research available today, results in a novel approach to inductive 
program synthesis with several advantageous aspects: Adopting the original 
two-step process of first generating finite programs from examples and then 
generalizing over them results in a clear separation of the knowledge depen- 
dent and the syntactic aspect of induction. Dividing the complex program 
synthesis problem in two parts allows us to address the sub-problems sepa- 
rately. The knowledge-dependent first part - generating finite programs from 
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examples - can be realized by different approaches, including the rewrite- 
approach proposed in the seventies. We propose to use state-based plan- 
ning. While there is a long tradition of combining (deductive) planning and 
deductive program synthesis, up to now there was no interaction between 
research on state-based planning and research on inductive program synthe- 
sis. State-based planning provides a powerful approach to calculate (opti- 
mal) transformation sequences from input states to a state fulfilling a set of 
goal relations by providing a powerful domain specification language together 
with a domain-independent search algorithm for plan construction. The sec- 
ond part - folding of finite programs ~ can be solved by a pattern-matching 
approach. The correspondence between folding program terms and inferring 
context-free tree grammars makes it possible to give an exact characterization 
of the class of recursive programs which can be induced. Defining pattern- 
matching for terms which are elements of an arbitrary term algebra makes 
the approach independent of a specific programming language. Synthesizing 
program schemes in contrast to programs allows for a natural combination 
of induction with analogical reasoning and learning. 

Learning Domain Specific Control Rules for Plans. Cross-fertilization be- 
tween state-based planning and inductive program synthesis results in a pow- 
erful approach to learning domain specific control rules for plans. Control rule 
learning currently becomes a major interest in planning research: Although 
a variety of efficient domain-independent planners have been developed in 
the nineties, for demanding real world applications it is necessary to guide 
search for (optimal) plans by exploiting knowledge about the structure of 
the planning domain. Learning control rules allows to gain the efficiency of 
domain-dependent planning without the need to hand-code such knowledge 
which is a time consuming and error-prone knowledge engineering task. Our 
functional approach to program synthesis is very well suited for control rule 
learning because in contrast to logic programs the control flow is represented 
explicitly. Furthermore, a lot of inductive logic programming systems do not 
provide an ordering for the induced clauses and literals. That is, evaluation 
of such clauses by a Prolog interpreter is not guaranteed to terminate with 
the desired result. In proof planning, as a special domain of planning in the 
area of mathematical theorem proving and program verification, the need 
for learning proof methods and control strategies is also recognized. Up to 
now, we have not applied our approach to this domain, but we see this as an 
interesting area of further research. 

Knowledge Acquisition for Software Engineering Tools. Knowledge based 
software engineering systems are based on formalized knowledge of various 
general and domain-specific aspects of program development. For that reason, 
such systems are necessarily incomplete. It depends on the “genius” of the 
authors of such systems - their analytical insights about program structures 
and their abilities to explicate these insights - how large the set of domains 
and the class of programs is that can be supported. Similarly to control rules 
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in planning, tactics are used to guide search. Furthermore, some proof sys- 
tems provide a variety of proof methods. While execution of transformation 
or proof steps is performed autonomously, the selection of an appropriate tac- 
tic or method is performed interactively. We propose that these components 
of software engineering tools are candidates for learning. The first reason is, 
that the guarantee of correctness which is necessary for the fully automated 
parts of such systems, remains intact. Only the “higher level” strategic as- 
pects of the system which depend on user interaction are subject to learning 
and the acquired tactics and methods can be accepted or rejected by the user. 
Our approach to program synthesis might complement these special kind of 
expert systems by providing an approach to model how some aspects of such 
expertise develop with experience. 

Programming by Demonstration. With the growing number of computer 
users, most of them without programming skills, program synthesis from 
example becomes relevance for practical applications. For example, watching 
a users input behavior in a text processing system provides traces which can 
be generalized to macros (such as “write a letter-head”), watching a users 
browsing behavior in the world-wide-web can be used to generate prefer- 
ence classes, or watching a users inputs into a graphical editor might provide 
suggestions for the next actions to be performed. In the simplest case, the re- 
sulting programs are just sequences of parameterized operations. By applying 
inductive program synthesis, more sophisticated programs, involving loops, 
could be generated. A further application is to support beginning program- 
mers. A student might interact with an interpreter by first giving examples 
for the desired program behavior and than watch, how a recursion is formed 
to generalize the program to the general case. 

Discovery Learning. Folding of finite programs and some aspects of plan 
transformation can be characterized as discovery learning. Folding of finite 
programs models the (human) ability to extract generalized rules by iden- 
tifying relevant structural aspects in perceptive inputs. This ability can be 
seen as the core of the flexibility of (human) cognition underlying the acquisi- 
tion of perceptual categories and linguistic concepts as well as the extraction 
of general strategies from problem solving experience. Furthermore, we will 
demonstrate for plan transformation how problem dependent selector func- 
tions can be extracted from a universal plan by a purely syntactic analysis 
of its structure. Because our approach to program synthesis is driven by 
the underlying structure of some initial experience (represented as universal 
plan), it is also more cognitively plausible than the search driven synthesis 
of inductive logic and genetic programming. 

Integrating Learning by Doing and by Analogy. Cognitive science approaches 
to skill acquisition from problem solving and early work on learning macro 
operators from planning focus on combining sequences of primitive opera- 
tors into more complex ones by merging their preconditions and effects. In 
contrast, our work addresses the acquisition of problem solving strategies. 
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Because domain dependent strategies are represented as recursive program 
schemes, they capture the operational aspect of problem solving as well as 
the structure of a domain. Thereby we can deal with learning by induction 
and analogy in a unified way - showing how schemes can be acquired from 
problem solving and how such schemes can be used and generalized in ana- 
logical problem solving. Furthermore, the identification of data types from 
plans captures the evolution of perceptive chunks by identifying the rele- 
vant aspects of a problem description and defining an order over such partial 
descriptions. 

Organization of the Book The book is organized in three main parts 
- Planning, Inductive Program Synthesis, and Analogical Problem Solving 
and Learning. Each part starts with an overview of research, along with an 
introduction of the basic concepts and formalisms. Afterwards our own work 
is presented, including relations to other work. We finish with summarizing 
the contributions of our approach to the field, and giving an outlook to further 
research. The overview chapters can be read independently of the research 
specific chapters. The research specific chapters presuppose that the reader is 
familiar with the concepts introduced in the overview chapters. The focus of 
our work is on inductive program synthesis and its application to control rule 
learning for planning. Additionally, we discuss relations to work on problem 
solving and learning in cognitive psychology. 

Part I: Planning. In chapter 2, first, the standard Strips language for specify- 
ing planning domains and problems and a semantics for operator application 
are introduced. Afterwards, extensions of the Strips language (ADL/PDDL) 
are discussed and contrasted with situation calculus. Basic algorithms for 
forward and backward planning are introduced. Complexity of planning and 
formal properties of planning algorithms are discussed. A short overview of 
the development of different approaches to planning is given and pre-planning 
analysis and learning are introduced as methods for obtaining domain spe- 
cific knowledge for making plan construction more efficient. In chapter 5, the 
non-linear, state-based, universal planner DPlan is introduced and in chapter 
4 an extension of the Strips language to function application is presented. In 
chapter 5, we evaluate our approach and discuss further work to be done. 

Part II: Inductive Program Synthesis. In chapter 6 we give a survey of auto- 
matic programming research, focussing on automatic program construction, 
that is, program synthesis. We give a short overview of constructive theorem 
proving and program transformation as approaches to deductive program 
synthesis. Afterwards, we present inductive program synthesis as a special 
case of machine learning. We introduce grammar inference as theoretical 
background for inductive program synthesis. Genetic programming, induc- 
tive logic programming and inductive functional programming are presented 
as three approaches to generalize recursive programs from incomplete specifi- 
cations. In chapter 7 we introduce the concept of recursive program schemes 
and present our approach to folding finite program terms. In chapter 8 we 
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present our approach to bridging the gap between planning and program syn- 
thesis by transforming plans into program terms. In chapter 9, we evaluate 
our approach and discuss relations to human strategy learning. 

Part III: Analogical Problem Solving and Learning. In chapter 10 we give 
an overview of approaches to analogical and case-based reasoning, discussing 
similarity measures and qualitative concepts for structural similarity. In chap- 
ter 1 1 some psychological experiments concerning problem solving with non- 
isomorphic example problems are reported. In chapter 12 we present anti- 
unification as an approach to structure mapping and generalization along 
with preliminary ideas for adaptation of non-isomorphic structures. In chap- 
ter 13 we evaluate our approach and discuss further work to be done. 



Part I 
Planning 




2. State-Based Planning 



Plan it first and then take it. 

— Travis McGee in: John D. MacDonald, The Long Lavender Look, 1970 

Planning is a major sub-discipline of AI. A plan is defined as a sequence of 
actions for transforming a given state into a state which fulfills a predefined 
set of goals. Planning research deals with the formalization, implementation, 
and evaluation of algorithms for constructing plans. In the following, we first 
(sect. 2.1.1) introduce a language for representing problems called standard 
Strips. Along with defining the language, the basic concepts and notations 
of planning are introduced. Afterwards (sect. 2.2) some extensions to Strips 
are introduced and situation calculus is discussed as an alternative language 
formalism. In section 2.3 basic algorithms for plan construction based on 
forward and backward search are introduced. We discuss complexity results 
for planning and give results for termination, soundness, and completeness 
of planning algorithms. In section 2.4 we give a short survey of well-known 
planning systems and an overview of concepts often used in planning lit- 
erature and pointers to literature. Finally (sect. 2.5), pre-planning analysis 
and learning are introduced as approaches for acquisition of domain-specific 
knowledge which can be used to make plan contruction more efficient. The 
focus is on state-based algorithms, including universal planning. Throughout 
the chapter we give illustrations using blocks- world examples. 



2.1 Standard Strips 

To introduce the basic concepts and notations of (state-based) planning, we 
first review Strips. The original Strips was proposed by Fikes and Nilsson 
(1971) and up to now it is used with slight modifications or some extensions 
by the majority of planning systems. The main advantage of Strips is that 
it has a strong expressive power (mainly due to the so called closed world 
assumption, see below) and at the same time allows for efficient planning 
algorithms. Informal introductions to Strips are given in all AI text books, 
for example in Nilsson (1980) and Russell and Norvig (1995). 



U. Schmid: Inductive Synthesis of Functional Programs, LNAI 2654, pp. 13-54, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 



14 



2. State-Based Planning 



2.1.1 A Blocks- World Example 

Let us look at a very restricted world, consisting of four distinct blocks. The 
blocks, called A, B, C, and D, are the objects of this blocks-world domain. 
Each state of the world can be described by the following relations: a block 
can be on the table, that is, the proposition ontable(A) is true in a state, where 
block A is lying on the table; a block can be clear, that is, the proposition 
clear(A) is true in a state, where no other block is lying on top of block A; 
and a block can be lying on another block, that is, the proposition on(B, C) 
is true in a state, where block B is lying immediately on top of block C. State 
changes can be achieved by applying one of the following two operators: 
putting a block on the table and putting a block on another block. Both 
operators have application conditions: A block can only be moved if it is 
clear and a block can only be put on another block, if this block is clear, 
too. Figure 2.1 illustrates this simple blocks-world example. If the goal for 
which a plan is searched is that A is lying on B and B is lying on C, the right- 
hand state is a goal state, because both proposition on(A, B) and proposition 
on(B, C) are true. Note that other states, for example a tower with D on top 
of A, B, C, also fulfill the goal because it was not demanded that A has to 
be clear or that C has to be on the table. 




To formalize plan construction, a language (or different languages) for 
describing states, operators, and goals must be defined. Furthermore, it has 
to be defined how a state change can be (syntactically) calculated. 

2.1.2 Basic Definitions 

The Strips language is defined over literals with constant symbols and vari- 
ables as arguments. That is, no general terms (including function symbols) 
are considered. We use the notion of term in the following definition in this 
specific way. In chapter 8 we will introduce a language allowing for general 
terms. 

Definition 2.1.1 (Strips Language). The Strips language Cs{X ,C^TZ) is 
defined over sets of variables X , eonstant symbols C, and relational symbols 
TZ in the following way: 
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— Variables x € X are terms. 

— Constant symbols c G C are terms. 

— If p G TZ is a relational symbol with arity a{p) = i and if t\, ... ,ti are 
terms, then p{t\, . . . ,U) is a formula. 

Formulas consisting of a single relational symbol are called (positive) liter- 
als. For short, we write p{t). 

— If pi{ti), . . . ,Pn{tn) are formulas, then {pi(ti), . . . is a formula, 

representing the conjunction of literals. 

— There are no other Strips formulas. 

Formulas over Cs{C, TV), i. e., formulas not containing variables are called 
ground formulas. A literal without variables is called atom. 

We write Cs as abbreviation for Cs{X ,C,IZ). With X{F) we denote the 
variables occurring in formula F. 

For the blocks- world domain given in figure 2.1, Cs is defined over the fol- 
lowing set of symbols: C = {A,B,C,D}, IZ = {on'^ , clear^ , ontable^} , where 
p* denotes a relational symbol of arity i. We will see below that for defining 
operators additionally a set of variables X = {lx,ly,lz} is needed. 

The Strips language can be used to represent states^ and to define syntac- 
tic rules for transforming a state representation by applying Strips operators. 
A state representation can be interpreted logically by providing a domain, i. e., 
a set of objects of the world, and a denotation for all constant and relational 
symbols. A state representation denotes all states of the world where the 
interpreted Strips relations are true. That is, a state representation denotes 
a family of states in the world. When interpreting a formula, it is assumed 
that all relations not given explicitly are false (the closed world assumption) . 

Definition 2.1.2 (State Representation). A problem state s is a con- 
junction of atoms. That is, s G Ls{C,IZ). 

That is, states are propositions (relations over constants). 

Examples of problem states are 

51 = {on(B, C), clear(A), clear(B), clear(D), ontable(A), ontable(C), 
ontable(D)} 

5 2 = {on(A, B), on(B, C), clear(A), clear(D), ontable(C), ontable(D)}. 

An interpretation of si results in a state as depicted on the left-hand side 
of figure 2.1, an interpretation of S 2 results in a state as depicted on the 
right-hand side. 

The relational symbols on(x, y), clear(x), and ontable(x) can change their 
truth values over different states. There might be further relational symbols, 

^ Note that we do not introduce different representations to discern between syn- 
tactical expressions and their semantic interpretation. We will speak of constant 
or relation symbols when referring to the syntax and of constants or relations 
when referring to the semantics of a planning domain. 
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denoting relations which are constant over all states, as for example the 
color of blocks (if there is no operator paint- block) . Relational symbols which 
can change their truth values are called fluents, constant relational symbols 
are called statics. We will see below, that fluents can appear in operator 
preconditions and effects, while statics can only appear in preconditions. 

Before introducing goals and operators defined over Cs{X,C,TZ), we in- 
troduce matching of formulas containing variables (also called patterns) with 
ground formulas. 

Definition 2.1.3 (Substitution and Matching). A substitution is a set 
of mappings a = {xi <— ti,...,Xn ^ tn} defining replacements of vari- 
ables Xi by terms ti. For language Cg terms U are restricted to variables and 
constants. By applying a substitution a to a formula F - denoted ~ odl 
variables Xi € X{F) with Xi ^ f G a are replaced by the associated ti. Note 
that this replacement is unique, i. e., identical variables are replaced by iden- 
tical terms. We call F„ an instantiated formula if all X{F) are replaced by 
constant symbols. 

For a formula F G Cg and a set of atoms A G Cg{C,TZ), match(F,A) = S 
gives all substitutions sigmat G E with F„. C A. 

For state si given above, the formula F = {ontable(x), clear(x), clear(y)} 
can be instantiated to if = {ui, <72, 0-3, 174} with = {x A, y B}, 
(72 = {x ^ A,y ^ D}, as = {x ^ D,y ^ A}, a4 = {x ^ D,y ^ B}. 

For the quantor-free language Cg{X,C,TZ), all variables occurring in for- 
mulas are assumed to be bound by existential quantifiers. That is, all formulas 
correspond to conjunctions of propositions. 

Definition 2.1.4 (Goal Representation). A goal Q is a conjunction of 
literals. That is, Q G £g(C, 7 Z). 

An example of a planning goal is Q = {on(A, B), on(B, C)}. All states s 
with t/cr C s are goal states. 

Definition 2.1.5 (Strips Operator). A Strips operator op is described by 
preconditions PRE, ADD- and DEL-lists^ , with PRE, ADD, DEL G Cg. 
ADD and DEL describe the operator effect. An instantiated operator o = 
opa- G Cg{C,TZ) is called action. We write PRE{o), ADD{o), DEL{o) to 
refer to the precondition, ADD-, or DEL-list of an (instantiated) operator. 

Operators with variables are also called operator schemes. 

An example for a Strips operator is: 

Operator: put(?x, ?y) 

PRE: (ontable(?x), clear(?x), clear(?y)} 

ADD: {on(?x, ?y)} 

DEL: (ontable(?x), clear(?y)} 

More exactly, the literals given in ADD and DEL are sets. 
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This operator can, for example, be instantiated to 



PRE: 

ADD: 

DEL: 



Operator: 



put(A, B) 

{ontable(A), clear(A), clear(B)} 
{on(A, B)} 

{ontable(A), clear(B)} 



We will see below (fig. 2.2), that a second variant for the put operator is 
needed for the case that block x is lying on another block z. In a, blocks- 
world with additional relations green(A), green(B), red(C), red(D), we could 
restrict the application conditions for pwt further, for example such, that only 
green blocks are allowed to be moved. Assuming that a put action does not 
affect the color of a block, red(x) and green(x) are static symbols. 

A usual restriction of operator instantiation is, that different variables 
have to be instantiated with different constant symbols.^ 

Definition 2.1.6 ((Forward) Operator Application). For a state s and 
an instantiated operator o, operator application is defined as Res{o, s) = 
s \ DEL{o) U ADD{o) if PRE{o) C s. 

Note that subtracting DEL{o) and adding ADD{o) are commutative (result- 
ing in the same successor state), only if DEL{o)C)ADD{o) = 0, DEL{o) C s, 
and ADD{o) n s = 0. The ADD-list of an operator might contain free vari- 
ables, i. e., variables which do not occur as arguments of the relational sym- 
bols in the precondition. This means that matching might only result in 
partial instantiations. The remaining variables have to be instantiated from 
the set of constant symbols C given for the current planning problem. 

Operator application gives us a syntactic rule for changing one state repre- 
sentation into another one by adding and subtracting atoms. On the semantic 
side, operator application describes state transitions in a state space (Newell 
& Simon, 1972; Nilsson, 1980), where a relation s' = Res{o,s) denotes that 
a state s can be transformed into a state s' . (Syntactic) operator application 
is admissible, if s' = Res{o, s) holds in the state space underlying the given 
planning problem, i. e., if (1) s' denotes a state in the world and if (2) this 
state can be reached by applying the action characterized by o in state s. 

For the left-hand state in figure 2.1 (si) and the instantiated put operator 
given above, PRE{o) C s holds and Res(o,s) results in the right-hand state 
in figure 2.1 ( 52 )- 

A Strips domain is given as set of operators. Extensions of Strips such as 
PDDL allow inclusion of additional information, such as types (see sect. 2.2). 
A Strips planning problem is given as P{O^I, Q) with O as set of operators, 
X as set of initial states and Q as set of top-level goals. An example for a 
Strips planning problem is given in figure 2.2. 



® The planning language PDDL (see sect. 2.2) allows explicit use of equality and 
inequality constraints, such that different variables can be instantiated with the 
same constant symbol if no inequality constraint is given. 
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Operators: 

PRE: 

ADD: 

DEL: 

PRE: 

ADD: 

DEL: 

PRE: 

ADD: 

DEL: 



put(?x, ?y) 

{ontable(?x), clear(?x), clear(?y)} 
{on(?x, ?y)} 

{ontable(?x), clear(?y)} 

put(?x, ?y) 

{on(?x, ?z), clear(?x), clear(?y)} 
{on(?x, ?y), clear(?z)} 

{on(?x, ?z), clear(?y)} 
puttable(?x) 

{clear(?x), on(?x, ?y)} 
{ontable(?x), clear(?y)} 

{on(?x, ?y)} 



Goal: 

Initial State: 



{on(A, B), on(B, C)} 

{on(D, C), on(C, A), clear(D), clear(B), ontable(A), ontable(B)} 



Fig. 2.2. A Strips Planning Problem in the Blocks- World 



While the language, in which domains and problems can be represented 
is clearly defined, there is no unique way of modeling domains and problems. 
In figure 2.2, for example, we decided, to describe states with the relations 
on, ontable, and clear. There are two different put operators, one is applied 
if block X is lying on the table, and the other if block x is lying on another 
block z. 

An alternative representation of the blocks-world domain is given in figure 
2.3. Here, not only the blocks, but also the table are considered as objects 
of the domain. Unary static relations are used to represent that constant 
symbols A, B, C, D are of type “block” . Now all operators can be represented 
as put(x, y). The first variant describes what happens if a block is moved from 
the table and put on another block, the second variant describes, how a block 
is moved from one block on another, and the third variant describes how a 
block is put on the table. Another alternative would be, to represent put as 
ternary operator put(block, from, to). 

Another decision which influences the representation of domains and prob- 
lems is with respect to the level of detail. We have completely abstracted from 
the agent who executes the actions. If a plan is intended for execution by a 
robot, it becomes necessary to represent additional operators, as picking up 
an object and holding an object and states have to be described with ad- 
ditional literals, for example what block the agent is currently holding (see 
fig. 2.4). 

2.1.3 Backward Operator Application 

The task of a planning algorithm is, to calculate sequences of transformations 
from states sq G T to states sg with G ^ sq- The planning problem is solved 
if such a transformation sequence - called a plan - is found. Often, it is also 
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Operators: 



PRE: 

ADD 

DEL: 



PRE: 

ADD 

DEL: 



PRE: 

ADD 

DEL: 



put(?x, ?y) 

{on(?x, Table), block(?x), block(?y), clear(?x), clear(?y)} 

{on(?x, ?y)} 

{on(?x, Table), clear(?y)} 

put(?x, ?y) 

{on(?x, ?z), block(?x), block(?y), block(?z), clear(?x), clear(?y)} 
{on(?x, ?y), clear(?z)} 

{on(?x, ?z), clear(?y)} 

put(?x, Table) 

{clear(?x), on(?x, ?y), block(?x), block(?y)} 

{on(?x. Table), clear(?y)} 

{on(?x, ?y)} 



Goal: 

Initial State: 



{on(A, B), on(B, C)} 

{on(D, C), on(C, A), on(A, Table), on(B, Table), 
clear(D), clear(B), block(A), block(B), block(C), block(D)} 



Fig. 2.3. An Alternative Representation of the Blocks- World 

required that the transformations are optimal. Optimality can be defined as 
minimal number of actions or - for operators associated with different costs 
- as an action sequence with a minimal sum of costs. 

Common to all state-based planning algorithms is that plan construction 
can be characterized as search in the state space. Each planning step involves 
the selection of an action. Finding a plan in general involves backtracking 
over such selections. To guarantee termination for the planning algorithm, it 
is necessary to keep track of the states already constructed, to avoid cycles. 
Search in the state space can be performed forward - from an initial state to 
a goal state or backward - from a goal state to an initial state. Backward 
planning is based on backward operator application: 

Definition 2.1.7 (Backward Operator Application). For a state s and 
an instantiated operator o, backward operator application is defined as 
Res~^{o, s) = s\ ADD{o) U {DEL{o) U PRE{o)) if ADD{o) C s. 



S 2 = {on(A, B), on(B, C), clear(A), clear(D), ontable(C), ontable(D)} 

backward application of the instantiated operator put(A, B) given above 
results in 

S 2 \ {on(A, B)} U {ontable(A), clear(A), clear(B)} = 

{on(B, C), clear(A), clear(B), clear(D), ontable(A), ontable(C), ontable(D)} 

= Sl. 

Backward operator application is sound, if for Res~^{o, s) = s' holds 
Res{o, s') = s for all states of the domain. Backward operator application 
is complete if for all Res{o,s') = s holds Res~^{o,s) = s' (see sect. 3.3). 



For state 
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Originally, backward operator application was not defined for “complete” 
state descriptions (i. e., an enumeration of all atoms over TZ which hold in a 
current state) but for conjunctions of (sub-) goals. We will discuss so-called 
goal regression in section 2.3.4. 

Note that while plan construction can be performed by forward and back- 
ward operator applications, plan execution is always performed by forward 
operator application - transforming an initial state into a goal state. 



2.2 Extensions and Alternatives to Strips 

Since the introduction of the Strips language in the seventies, different exten- 
sions have been introduced by different planning groups, both to make plan- 
ning more efficient and to enlarge the scope to a larger set of domains. The 
extensions were mainly influenced by work from Pednault (1987, 1994) who 
proposed ADL (action description language) as a more expressive but still 
efficient alternative to Strips. The language PDDL (Planning Domain Defini- 
tion Language) can be seen as a synthesis of all language features which were 
introduced in the different planning systems available today (McDermott, 
1998b). 

Strips and PDDL are based on the closed world assumption, allowing that 
state transformations can be calculated by adding and deleting literals from 
state descriptions. Alternatively, planning can be seen as logical inference 
problem. In that sense, a Prolog interpreter is a planning algorithm. Situation 
calculus as a variant of first order logic was introduced by McCarthy (1963), 
McCarthy and Hayes (1969). Although most today planning systems are 
based on the Strips approach, situation calculus is still influential in planning 
research. Basic concepts from situation calculus are used to reason about 
semantic properties of (Strips) planning. Furthermore, deductive planning is 
based on this representation language. 

2.2.1 The Planning Domain Definition Language 

The language PDDL was developed 1998. Most current planning systems 
are based on PDDL specifications as input and planning problems used at 
the AIPS planning competitions are presented in PDDL (McDermott, 1998a; 
Bacchus et ah, 2000). The development of PDDL was a joint project involving 
most of the active planning research groups of the nineties. Thus, it can be 
seen as a compromise between the different syntactic representations and 
language features available in the major planning systems of today. 

The core of PDDL is Strips. An example for the blocks-world representa- 
tion in PDDL is given in figure 2.4. For this example, we included aspects of 
the behavior of an agent in the domain specification (operators pickup, and 
putdown, relation symbols arm-empty and holding). Note that ADD- and 
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DEL-lists are given together in an effect-slot. All positive literals are added, 
all negated literals are deleted from the current state. 



(define (domain blocksworld) 

(: requirements : strips) 

(: predicates (clear ?x) 

(on-table ?x) 

(arm- empty) 

(holding ?x) 

(on ?x ?y)) 

( : action pickup 
: parameters (?ob) 

: precondition (and (clear ?ob) (on-table ?ob) (arm-empty)) 

: effect (and (holding ?ob) (not (clear ?ob)) (not (on-table ?ob)) 
(not (arm-empty)))) 

( : action putdown 
: parameters (?ob) 

: precondition (holding ?ob) 

: effect (and (clear ?ob) (arm-empty) (on-table ?ob) 

(not (holding ?ob)))) 

(: action stack 

: parameters (?ob ?underob) 

: precondition (and (clear ?underob) (holding ?ob)) 

: effect (and (arm-empty) (clear ?ob) (on ?ob ?underob) 

(not (clear ?underob) ) (not (holding ?ob)))) 

(: action unstack 

: parameters (?ob ?underob) 

: precondition (and (on ?ob ?underob) (clear ?ob) (arm-empty)) 

: effect (and (holding ?ob) (clear ?underob) 

(not (on ?ob Tunderob) ) (not (clear ?ob)) (not (arm-empty)))) 

) 

(define (problem towerS) 

(: domain blocksworld) 

(:objects a b c) 

( : init (on-table a) (on-table b) (on-table c) 

(clear a) (clear b) (clear c) (arm-empty)) 

(:goal (and (on a b) (on b c))) 



Fig. 2.4. Representation of a Blocks- World Problem in PDDL-Strips 



Extensions of Strips included in PDDL domain specifications are 

— Typing, 

~ Equality constraints, 

— Conditional effects, 

— Disjunctive preconditions, 

— Universal quantification, 

— Updating of state variables. 
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Most modern planners are based on Strips plus the first three exteirsions. 
While the first four extensions luainly result in a higher effectiveness for plair 
constructioir, the last two extensions enlarge the class of doiuains for which 
plans can be constructed. 

Unfortunately, PDDL (McDermott, 1998b) does mainly provide a syntac- 
tic framework for these features but does give no or only an informal descrip- 
tion of their semantics. A semantics for conditional effects and effects with 
all-quantification is given in Koehler, Nebel, and Hoffmann (1997). In the fol- 
lowing, we introduce typing, equality constraints, and conditional effects. Up- 
datiirg of state variables is discussed in detail in chapter 4. Air example for a 
blocks- world domain specification using an operator with equality-constraints 
and conditional effects is given in figure 2.5. 



(define (domain blocksworld-adl) 

(: requirements : strips : equality : conditional-effects) 

(: predicates (on ?x ?y) 

(clear ?x)) ; clear (Table) is static 
(: action puton 
: parameters (?x ?y ?z) 

: precondition (and (on ?x ?z) (clear ?x) (clear ?y) 

(not (= ?y ?z)) (not (= ?x ?z)) 

(not (= ?x ?y) ) (not (= ?x Table))) 

: effect 

(and (on ?x ?y) (not (on ?x ?z)) 

(when (not (eq ?z Table)) (clear ?z)) 

(when (not (eq ?y Table)) (not (clear ?y))))) 

) 

Fig. 2.5. Blocks- World Domain with Equality Constraints and Conditioned Effects 



2. 2. 1.1 Typing and Equality Constraints. The precondition in figure 
2.5 contains equality constraints, expressing, that all three objects iirvolved 
iir the puton operator have to be different. 

Equality constraiirts restrict what substitutions are legal in matching a 
current state aird a precondition. That is, defiirition 2.1.3, is modified such 
that for all pairs {x <— t), (x' ^ t') in a substitution a t ^ t' has to hold, if 
(not (= X x’ ) ) is specified in the precondition of the operator. 

Instead of using explicit equality constraints, matching can be defined 
such that variables with different names must generally be instantiated with 
different constants. But, explicit equality constraiirts give more expressive 
power: giving no equality constraint for two variables x and y allows that 
these variables can be instantiated by different or the same constant. Using 
“implicit” equality constraints make it necessary to specify two different op- 
erators, one with only one variable name (x = y), and one with both variable 
names {x ^ y). 
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Besides equality constraints, types can be used to restrict matching. Let 
us assume a blocks-world, where movable objects consist of blocks and of 
pyramids and where no objects can be put on top of a pyramid. We can 
introduce the following hierarchy of types: 

~ table 

— block 

— pyramid 

— movable- object: block, pyramid 

— flattop- object: table, block. 

Operator puton(x, y, z) can now be defined over typed variables x:movable- 
object, y:flattop-object, z:flattop-object. When defining a problem for that ex- 
tended blocks-world domain, each constant must be declared together with 
a type. 

Typing can be simulated in standard Strips using static relations. A simple 
example for typing is given in figure 2.3. An example covering the type hierar- 
chy from above is: {table(T), fiattop-object(T) , block(B), movable-object(B), 
flattop-object(B), pyramid(A), movable- object (A)}. The operator precondi- 
tion can then be extended by {movable-object(x), flattop-object(y) , flattop- 
object(z)}. 

2. 2. 1.2 Conditional Effects. Conditional effects allow to represent con- 
text dependent effects of actions. In the blocks-world specification given in 
figure 2.3, three different operators were used to describe the possible effects 
of putting a block somewhere else (on another block or the table). In con- 
trast, in figure 2.5, only one operator is needed. All variants of putonfx y 
z) have some general application restrictions, specified in the precondition: 
block X is lying on something (z is another block or the table); and both x 
and y are clear, where clear (Table) is static, i. e., holds in all possible situa- 
tions. Additionally, the equality constraints in the precondition specify that 
all objects involved have to be different. Regardless of the context in which 
puton is applied, the result is that x is no longer lying on z, but on y. This 
context independent effect is specified in the first line of the effect. The next 
two lines specify additional consequences: If x was not lying on the table, 
but on another block z, this block z is clear after operator application; and if 
block X was not put on the table, but on a block y, this block y is no longer 
clear after operator application. Preconditions for conditioned effects are also 
called secondary preconditions , conditioned effects are also called secondary 
effects (Penberthy & Weld, 1992; Fink & Yang, 1997). 

The main advantage of conditional effects is, that plan construction gets 
more efficient: In every planning step, all operators must be matched with 
the current state representation and all successfully matched operators are 
possible candidates for application. If the general precondition of an operator 
with conditioned effects does not match with the current state, it can be 
rejected immediately, while for the unconditioned variants given in figure 2.3 
all three preconditions have to be checked. 
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The semantics of applying an operator with conditioned effects is: 

Definition 2.2.1 (Operator Application with Conditional Effects). 

Let PRE be the general precondition of an operator, and ADD and DEL 
the unconditioned effects. Let PREi, ADDi, DELi, with i = 0...n be 
context conditions with their associated context effects. Eor a state s and 
an instantiated operator o, operator application is defined as Res{o, s) = 
s\{DEL{o) U.6M DEL,{o))UiADD(o) ADD,{o)) if PRE(o) C 

s and M = {i \ PREi{o) C s}. 

Backward application is defined as Res~^{o, s) = s 
\{ADD{o) U.eM ADD.io)) 

U [{DEL{o) UeM DEL,{o))U{PRE{o) PRE^{o))] 

if ADD(o) C s and M = {i \ ADDi{o) C s}. 

2.2.2 Situation Calculus 

Situation calculus was introduced by McCarthy (McCarthy, 1963; McCarthy 
& Hayes, 1969) to describe state transitions in first order logic. The world 
is conceived as a sequence of situations and situations are generated from 
previous situations by actions. Situations are - as Strips state representations 
- necessarily incomplete representations of the states in the world! 

Relations which can change over time are called fluents. Each fluent has 
an extra argument for representing a situation. For example, clear (a, siff 
denotes, that block a is clear in a situation referred to as si. Changes in the 
world are represented by a function Res (action, situation) = situation. For 
example, we can write S 2 = Res(put(a, b), si) to describe that applying pitf ("a, 
in situation si results in a situation S 2 . Note that the concept of fluents 
is also used in Strips. Although fluents are there not specially marked, it can 
be inferred from the operator effects, which relational symbols correspond to 
fluent relations. We already used the Res-function to describe the semantics 
of operator applications in Strips (sect. 2.1.1). In situation calculus, the result 
of an operator application is not calculated by adding and deleting literals 
from a state but by logical inference. 

The first implemented system using situation calculus for automated plan 
construction was proposed by Green (1969) with the QA3 system. Alterna- 
tively to the explicit use of a i?es-function, not only fluents, but also actions 
are provided with an additional argument for situations. For example, we can 
write S 2 = put(a, b, si) to describe that block a is put on block b in situa- 
tion si, resulting in a new situation S 2 - Green’s inference system is based on 
resolution.^ For example, the following two axioms might be given (the first 
axiom is a specific fact) : 

We represent constants with small letters and variables with large letters. 

® We do not introduce resolution in a formal way but give an illustratory example. 
We assume that the reader is familiar with the basic concepts of theorem proving, 
as they are introduced in logic or AI textbooks. 
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A1 on(a, table, s\) 

A2 V S[on{a, table, S) — > on{a,b,put{a,b, S))] = 

-^on(a, table, S) V on(a, b, put(a, b, S)) (clausal form). 

Green’s theorem prover provides two results: first, it infers, whether some 
formula - representing a planning goal - follows from the axioms, and second, 
it provides the action sequence - the plan - if the goal statement can be 
derived. Not only a yes/no answer but also a plan how the goal can be 
achieved can be returned because a so called answer literal is introduced. 
The answer literal is initially given as answer (S) and at each resolution step, 
the variable S contained in the answer literal is instantiated in accordance 
with the involved formulas. 

For example, we can ask the theorem prover whether there exists a situ- 
ation Sp in which on(a, b, Sp) holds, given the pre-defined axioms. If such 
a situation Sp exists, answer(Sp) will be instantiated throughout the reso- 
lution proof with the plan. Resolution proofs work by contradiction. That is, 
we start with the negated goal: 

1. ~^on(a, b, Sp) 

2. ^ on(a, table, S) V on(a, b, put(a, b, S)) 

3. ^ on(a, table, S) 
answer(put(a, b, S)) 

4. on(a, table, si) 

5. contradiction 
answer(put(a, b, si)) 

The resolution proof shows that a situation S 2 = on(a, table, s\) with on(a, 
b, S 2 ) exists and that S 2 can be reached by putting a on 6 in situation si. 

Situation calculus has the full expressive power of first order predicate 
logic. Because logical inference does not rely on the closed world assumption, 
specifying a planning domain involves much more effort than in Strips: In 
addition to axioms describing the effects of operator applications, frame ax- 
ioms, describing what predicates remain unaffected by operator applications, 
have to be specified. 

Frame axioms become always necessary when the goal is not only a single 
literal but a conjunction of literals. For illustration, we extend the example 
given above: 

A3 on(a, table, si) 

A4 on(b, table, si) 

A5 on(c, table, si) 

A6 ^on(X, table, S) V on(X, Y, put(X, Y, S)) 

A7 V S[on{Y, Z, S) —>■ on(Y, Z,put{X, Y, S'))] = 

~^on(Y, Z, S) V on(Y, Z, put(X, Y, S)) 

A8 V S[on{X,table, S) on(X, table, put(Y, Z, S))] = 

~^on(X, table, S) V on(X, table, put(Y, Z, S)). 



(Negation of the theorem) 
(A2) 

(Resolve 1, 2) 
(Al) 

(Resolve 3, 4) 
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Axiom A6 corresponds to axiom A2 given above, now stated in a more general 
form, abstracting from blocks a and b. Axiom A7 is a frame axiom, stating 
that a block Y is still lying on a block Z, after a block X was put on block 
Y . Axiom A8 is a frame axiom, stating that a block X is still lying on the 
table, if a block Y is put on a block Z. Note, that the given axioms are only 
a sub-set of the information needed for modeling the blocks- world domain. A 
complete axiomatization is given in figure 2.6 (where we represent the axioms 
in a Prolog-like notation, writing the conclusion side of an implication on the 
left-hand side). 



Effect Axioms: 

on(X, Y, put(X, Y, S)) ^ 
clear(Z, put(X, Y, S)) <— 
clear(Y, puttable(X, S)) <— 
ontable(X, puttable(X, S)) <— 
Frame Axioms: 
clear(X, put(X, Y, S)) <— 
clear(Z, put(X, Y, S)) <— 
ontable(Y, put(X, Y, S)) <— 
ontable(Z, put(X, Y, S)) ^ 
on(Y, Z, put(X, Y, S)) ^ 
on(W, Z, put(X, Y, S)) ^ 
clear(Z, puttable(X, S)) <— 
ontable(Z, puttable(X, S)) <— 
on(Y, Z, puttable(X, S)) <— 
clear(Z, puttable(X, S)) <— 
ontable(Z, puttable(X, S)) <— 
on(W, Z, puttable(X, S)) <— 
Facts (Initial State): 
on(d, c, si) 
on(c, a, si) 
clear(d, si) 
clear(b, si) 
ontable(a, si) 
ontable(b, si) 

Theorem (Goal): 
on(a, b, S) A on(b, c, S) 



clear(X, S) A clear(Y, S) 

on(X, Z, S) A clear(X, S) A clear(Y, S) 

on(X, Y, S) A clear(X, S) 

clear(X, S) 

clear(X, S) A clear(Y, S) 

clear(X,S) A clear(Y, S) A clear(Z, S) 

clear(X, S) A clear(Y, S) A ontable(Y, S) 

clear(X, S) A clear(Y, S) A ontable(Z, S) 

clear(X, S) A clear(Y, S) A on(Y, Z, S) 

clear(X, S) A clear(Y, S) A on(W, Z, S) 

clear(X, S) A clear(Z, S) 

clear(X, S) A ontable(Z, S) 

clear(X, S) A on(Y, Z, S) 

on(Y, X, S) A clear(Y, S) A clear(Z, S) 

on(Y, X, S) A clear(Y, S) A ontable(Z, S) 

on(Y, X, S) A clear(Y, S) A on(W, Z, S) 



Fig. 2.6. Representation of a Blocks- World Problem in Situation Calculus 



For the goal 3 SF[on{a, b, Sp) A on{b, c, 5 'f)], the resolution proof is®: 



1. ~^on(a, b, Sp) \/ ^on(b, c, Sp) 

2. -^on(X, table, S) V on(X, Y, put(X, Y, S)) 

3. ^on(b, c, put(a, b, S)) V ^on(a, table, S) 

4. ^on(Y, Z, S’) V on(Y, Z, put(X, Y, S’)) 



(Negation of the theorem) 
(A6) 

(Resolve 1, 2) 
(A7) 



When introducing a new clause, we rename variables such that there can be no 
confusion, as usual for resolution. 
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5. 


^on(a, table, S’) V 


J 

o 


c. S’) 




6. 


^on(X, table, S) V 


on(X, 


Y, put(X, 


Y, S)) 


7. 


~^on(a, table, put(b. 


c, S)) 


V ^on(b. 


table, S) 


8. 


^on(X, table. S’) V 


on(X, 


table, put(Y, Z, S’)) 


9. 


~^on(b, table, S) V -^on(a. 


table, S) 




10. 


on(b, table, si) 








11. 


^on(a, table, S) 








12. 


on(a, table, s\) 








13. 


contradiction. 









(Resolve 3, 4) 
(A6) 

(Resolve 5, 6) 

(AS) 

(Resolve 7, 8) 
(A4) 

(Resolve 9, 10) 
(A3) 



Resolution gives us an inference rule for proving that a goal theorem 
follows from a set of axioms and facts. For automated plan construction, ad- 
ditionally a strategy which guides the search for the proof (i. e., the sequence 
in which axioms and facts are introduced into the resolution steps) is needed. 
An example for such a strategy is SLD-resolution as used in Prolog (Sterling 
& Shapiro, 1986). In general, finding a proof involves backtracking over the 
resolution steps and over the ordering of goals. 

The main reason why Strips and not situation calculus got the standard 
for domain representations in planning is certainly that it is much more time 
consuming and also much more error-prone to represent a domain using ef- 
fect and frame axioms in contrast to only modeling operator preconditions 
and effects. Additionally, special purpose planning algorithms are naturally 
more efficient than general theorem provers. Furthermore, for a long time, the 
restricted expressiveness of Strips was considered sufficient for representing 
domains which are of interest in plan construction. When more interesting 
domains, for example domains involving resource constraints (see chap. 4), 
were considered in planning research, the expressiveness of Strips was ex- 
tended to PDDL. But still, PDDL is a more restricted language than situa- 
tion calculus. Due to the progress in automatic theorem proving over the last 
decade, the efficiency concerns which caused the prominence of state-based 
planning, might no longer be true (Bibel, 1986). Therefore, it might be of in- 
terest again, to compare current state-based and current deductive (situation 
calculus) approaches 



2.3 Basic Planning Algorithms 

In general, a planning algorithm is a special purpose search algorithm. State- 
based planners search in the state-space, as shortly described in section 2.1.1. 
Another variant of planners, called partial-order planners, search in the so- 
called plan space. Deductive planners search in the “proof space”, i. e. the 
possible orderings of resolution steps. Basic search algorithms are introduced 
in all introductory algorithm textbooks, for example in Cormen, Leiserson, 
and Rivest (1990). 
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In the following, we will first introduce basic concepts for plan construc- 
tion. Then forward planning is described, followed by a discussion of complex- 
ity results for planning and formal properties for plans. Finally, we introduce 
backward planning. 

2.3.1 Informal Introduction of Basic Concepts 

The definitions for forward and backward operator application given above 
(def. 2.1.6 and def. 2.1.7) are the crucial component for plan construction: 
The transformation of a given state into a next state by operator applica- 
tion constitutes one planning step. In general, each planning step consists of 
matching all operators with the current state description, selecting one in- 
stantiated operator which is applicable in the current state, and applying this 
operator. Plan construction involves a series of such match-select-apply 
cycles. In each planning step one operator is selected for application, that 
is, plan construction is based on depth-first search and operator selection is a 
backtrack point. During plan construction, a search tree is generated. State 
descriptions are nodes, action applications are arcs in the search tree. Each 
planning step expands the current leaf node s of the search tree by intro- 
ducing a new action o and the state description s' resulting from applying 
o to s. For forward planning, search starts with an initial state as root; for 
backward planning, search starts with the top-level goals (or a goal state) as 
root. 

Input in a planning algorithm is a planning problem P{0,X,Q), as 
defined in section 2.1.1. Output of a planning algorithm is a plan. For basic 
Strips planning, a plan is a sequence of actions, transforming an initial state 
into an state fulfilling the top-level goals. Such an executable plan is also 
called solution. More general, a plan is a set of operators together with a 
set of binding constraints for the variables occurring in the operators and 
a set of ordering constraints defining the sequence of operator applications. 
If the plan is not executable, a plan is also called partial plan. A plan is 
not executable if it does not contain all operators necessary to transform an 
initial state into a goal state, or if not all variables are instantiated, or if there 
is no complete ordering of the operators. 

When implementing a planner, at least the following functionalities must 
be provided for: 

— A pattern-matcher and possibly a mechanism for dealing with free variables 
(i. e., variables which are not bound by matching an operator with a set 
of atoms). 

~ A strategy for selecting an applicable action and for handling backtracking 
(taking back an earlier commitment). 

~ A mechanism for calculating the effect of an operator application. 

— A data structure for storing the partially constructed plan. 
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~ A data structure for holding the information necessary to detect cycles 
(i. e., states which were already generated) to guarantee termination. 

Some planning systems calculate a set of actions before plan construc- 
tion - by instantiating the operators with the given constant symbols. As 
a consequence, during plan construction, matching is replaced by a simple 
subset-test, where it is checked whether the set of instantiated preconditions 
of an operator is completely contained in the set of atoms representing the 
current state. Instantiation of operators is realized with respect to the (possi- 
bly typed) constants defined for an problem by extracting them from a given 
initial state or by using the set of explicitly declared constants. In general, 
such a “freely generated” set of actions is a super-set of the set of “legal” 
actions. 

Often planning algorithms do not directly construct a plan as fully instan- 
tiated, totally ordered sequence of actions. Some planners construct partially 
ordered plans where some actions for which no order constraints were deter- 
mined during planning are “parallel” . Some planners store plans as part of a 
more general data structure. In such cases, additional functionalities for plan 
linearization and/or plan extraction must be provided. 

2.3.2 Forward Planning 

Plan construction based on forward operator application starts with an initial 
state as root of the search tree. Plan construction terminates successfully, if 
a state is found which satisfies all top-level planning goals. Forward planning 
is also called progression planning. An informal forward planning algorithm 
is given in table 2.1. 



Table 2.1. Informal Description of Forward Planning 

— Until the top-level goals are satisfied in a state or nntil all possible states are 
explored DO 

For the current state s: 

— MATCH: For all operators op £ O calculate all substitutions such that the 
operator preconditions are contained in s. Generate a set of action candidates 
A = {o I PRE{o) C s}. 

— SELECT: Select one element o from A. 

— APPLY: Calculate the successor state Res{s,o) = s'. 

— BACKTRACK: If there is no successor state {A is empty), go back to the pre- 
decessor state of s. If the generated successor state is already contained in the 
plan, select another element from A. 

— PROCEED: Otherwise, insert the selected action in the plan and proceed with 
s' as current state. 



In this algorithm, we abstract from the data structure for saving a plan 
and from the data structure necessary for handling backtracking and for 



30 



2. State-Based Planning 



detection of cycles. One possibility would be, to put in each planning step 
an action-successor-state pair on a stack. When the algorithm terminates 
successfully, the plan corresponds to the sequence of actions on the stack. For 
backtracking and cycle detection a second data structure is needed where all 
generated and rejected states are saved. Both informations can be represented 
together, if the complete search history is saved explicitly in a search tree (see 
Winston, 1992, chap. 4). A search tree can be represented as list of lists: 

Definition 2.3.1 (The Data Structure “Search Tree”). A search tree 
ST can be represented as a list of lists, where each list represents a path: ST = 
nil I cons(path, ST) with 

, / f true i f ST = nil 

empty{ST) = ^ /ai.e else 

and selector functions first(cons(path, ST)) = path, rest(cons(path, ST)) = ST. 

— A (partially expanded) path is defined as a list of action-state pairs (o s). The 
root (initial state) is represented as {nil s). The selector function getaction((o s)) 
returns the first, and the selector function getstate((o s)) the second argument of 
an action-state pair. 

— A path is defined as: path = nil \ rcons{path,pair) with selector function 
last(rcons(path, pair)) = pair. 

ITit/i getstate(last(path)) a leaf of the search tree is retrieved. For a current state 
s = getstate(last(path)) and a list of action candidates A a path is expanded by 
expand(path. A) = rcons(path, cons(o, s’)) for all o G A and Res{o,s) = s' . 

A plan can be extracted from a path with getplan(path) = map(A(x). getaction(x) 
(path)), where the higher- oder function map describes that function getaction is 
applied to the sequence of all elements in path. 

A “fleshed-out” version of the algorithm in table 2.1 is given in table 2.2. 
For saving the possible backtrack points now all actions which are applicable 
in a state are inserted in the search tree, together with their corresponding 
successor states. That is, selection of an action is delayed one step. The 
selection strategy is to always expand the first ( “left-most” ) path in the search 
tree. That is, the algorithm is still depth- first. To guarantee termination, for 
the cycle-test it is sufficient to check whether an action results in a state which 
is already contained in the current path. Alternatively, it could be checked, 
if the new state is contained already anywhere else in the search tree. This 
extended cycle check makes sure that a state is always reached with the 
shortest possible action sequence but involves possibly more backtracking. 
Note that the extended cycle-test does not result in optimal plans: although 
every state already included in the search tree is reached with the shortest 
possible action sequence, the optimal path might be found in a not expanded 
part of the search tree which does not include this state. 

An example for plan construction by forward search is given in figure 
2.7. We use the operators as defined in figure 2.2. The top-level goals are 
on(A, B), on(B, C), and the initial state is on(A, C), ontable(B), ontable(C), 
clear(A), clear(B). For better readability, the states are presented graphically 
and not as set of literals. All generated and detected cycles are presented in 



backtrack 
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Table 2.2. A Simple Forward Planner 

— Main Function /uipZan(^0, G, ST) 

— Initial Function Call: fwplan(0, Q, [[(nil s)[[) with a set of operators O, an initial 
state s £T, and a set of top-level goals Q 

— Return: A plan as sequence of actions which transform s in a state so with 
Q C sg, extracted from search tree ST. 

1. IF empty(ST) THEN “no plan found” 

2. ELSE LET be current state S = getstate(last{first{ST))). 

(corresponds to SELECT action getaction(last(first(ST)))) 

a) IF G C S' THEN getplan(first(ST)) 

b) ELSE 

i. MATCH: For all op a O calculate match{op, S) as in def. 2.1.3. 

LET A = {oi,...o„} be the list of all instantiated operators with 
PRE{oi) C S. 

ii. APPLY: For all Oi G A calculate S( = Res{oi, S) as in def. 2.1.6. 

LET AR= {(oi S()...(o„ S;)} 

iii. Cycle- Test: Remove all pairs (o; S() from AR where Si is contained 
in first(ST). 

iv. Recursive call: 

IF empty (AR) TUEN fwplan(0, G, rest(ST)) (BACKTRACK) 

EESE fwplan(0, G, append(expand(first(ST), AR), tail(ST))). 
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Fig. 2.7. A Forward Search Tree for Blocks- World 
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the figure but only for the first two backtracking is explicitly depicted. The 
generation of action candidates is based on an ordering of put before puttahle 
and instantiation is based on alphabetical order {A before B before C). 

For the given sequence in which action candidates are expanded, search is 
very inefficient. In fact, all possible states of the blocks- world are generated to 
find the solution. In general, un-informed forward search has a high branching 
factor. Forward search can be made more efficient, if some information about 
the probable distance of a state to a goal state can be used. The best known 
heuristic forward search algorithm is A* (Nilsson, 1971), where a lower bound 
estimate for the distance from the current state to the goal is used to guide 
search. For domain independent planning ~ in contrast to specialized problem 
solving - such a domain specific information is not available. An approach 
how to generate such estimates in a pre-processing step to planning was 
presented by Bonet and Geffner (1999). 

2.3.3 Formal Properties of Planning 

2. 3. 3.1 Complexity of Planning. Even if a search tree can be kept smaller 
than in the example given above, in the worst case planning is NP-complete. 
That is, a (deterministic) algorithm needs exponential time for finding a 
solution (or deciding that a problem is not solvable) . For a search tree with a 
maximal depth of n and a maximal branching factor of m, in the worst case, 
planning effort is m", i. e., the complete search tree has to be generated.^ Even 
worse, for problems without a restriction of the maximal length of a possible 
solution, planning is PSPACE-complete. That is, even for non-deterministic 
polynomial algorithms an exponential amount of memory is needed (Garey 
& Johnson, 1979). 

Because in the worst case, all possible states have to be generated for 
plan construction, the number of nodes in the search tree is approximately 
equivalent to the number of problem states. For problems, where the number 
of states grows systematically with the number of objects involved, the max- 
imal size of the search tree can be calculated exactly. For example, for the 
Tower of Hanoi domain® with three pegs and n discs, the number of states is 
3" (and the minimal depth of a solution path is 2"“^). 

The blocks- world domain is similar to the Tower of Hanoi domain but has 
less constraints. First, each block can be put on each other block (instead of 
a disc can only be put on another disc if it is smaller) and second, the table is 
assumed to have always additional free space for a block (instead of only three 
pegs where discs can be put). Enumeration of all states of the blocks- world 

^ Note that this only holds for finite planning domains where all states are enu- 
merable. For more complex domains involving relational symbols over arbitrary 
numerical arguments, the search tree becomes infinite and therefore other cri- 
teria have to be introduced for termination (making planning incomplete, see 
below) . 

® The Tower of Hanoi domain is introduced in chapter 4. 
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domain with n blocks corresponds to the abstract problem of generating a 
set of all possible sets of lists which can be constructed from n elements. For 
example, for three blocks A, B, C: 

{{(AB C)},{(A CB)}, {(BA C)},{(B C A)}, {(C A B)},{(C B A)}, 

{(B C), (A)}, {(CB), (A)}, {(A C), (B)}, {(C A), (B)}, B), (C)}, 

{(BA), (C)}, 

{(A) (B) (C)} }. 

A single list with three elements represents a single tower, for example a 
tower with blocks Aon B and B on C for the first element given above. A set 
of two lists represents two towers on the table, for example block B lying on 
block C and A as a one-block tower. The number of sets of lists corresponds 
to the so-called Lah-number (Knuth, 1992).® It can be calculated by the 
following formula: 

a(0) = 0 
a(l) = 1 

a(n) = [(2n - 1) ■ a(n - 1)J - [(n - 1) ■ (n - 2) ■ a(n - 2)]. 

The growth of the number of states for the blocks-world domain is given for 
up to sixteen blocks in table 2.3. 



Table 2.3. Number of States in the Blocks- World Domain 

# blocks 1 2 3 4 5 

# states 1 3 13 73 501 

approx. 1.0 X 10° 3.0 X 10“ 1.3 X 10^ 7.3 X 10^ 5.0 X 10^ 

# blocks 6 7 8 9 10 

# states 4051 37633 394353 4596553 58941091 

approx. 4.1 X 10^ 3.8 X 10'* 3.9 X 10** 4.6 X 10® 5.9 X 10*' 

#blocks 11 12 13 14 15 

# states 824073141 12470162233 202976401213 3535017524403 65573803186921 

approx. 8.2 X 10® 1.3 X 10*° 2.0 X 10** 3.5 X 10*^ 6.6 x 10*® 

#blocks 16 

# states 1290434218669921 
approx. 1.3 X 10^^ 



The search tree might be even larger than the number of legal states of a 
problem - i. e., the number of nodes in the state-space. First, some states can 
be constructed more than once (if cycle detection is restricted to paths) and 
second, for some planning algorithms, “illegal” states which do not belong to 
the state space might be constructed. Backward planning by goal regression 
can lead to such illegal states (see sect. 2.3.4). 

An analysis of the complexity of Strips planning is given by (Bylander, 
1994). Another approach to planning is that the state space (or that part of 
it which might contain the searched for solution) is already given. The task of 

® More background information can be found at http://www.research.att.com/ 
cgi-bin/access . cgi/as/njas/sequences/eisA . cgi?Anum=000262. 
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the planner is than to extract a solution from the search space (this strategy 
is used by Graphplan, see sect. 2. 4. 2.1). This problem is still NP-complete! 

Because planning is an inherently hard problem, no planning approach 
can outperform alternative approaches in every domain. Instead, different 
planning approaches are better suited for different kinds of domains. 

2. 3. 3. 2 Termination, Soundness, Completeness. A planning algorithm 
should - like every search algorithm - be correct and complete: 

Definition 2.3.2 (Soundness). A planning algorithm is sound if it only 
generates legal solutions for a planning problem. That is, the generated se- 
quence of actions transforms the given initial state into a state satisfying the 
given top-level goals. Soundness implies that the generated plans are consis- 
tent: A state generated by applying an action to a previous state is consistent 
if it does not contain contradictory literals, i. e., if it belongs to the domain. 
A solution is consistent if it does not contain contradictions with regard to 
variable bindings and to ordering of states. 

Proof of soundness relies on the operational semantics given for operator ap- 
plication (such as our definitions 2.1.6 and 2.1.7). The proof can be performed 
by induction over the length of the sequences of actions. 

Correctness follows from soundness and termination: 

Definition 2.3.3 (Correctness). A search algorithm is correct, if it is 
sound and termination is guaranteed. 

To prove termination, it has to be shown, that for each plan step the search 
space is reduced such that the termination conditions given for the planning 
algorithm are eventually reached. For finite domains and a cycle-test covering 
the search tree, the planner always terminates when the complete search tree 
was constructed. 

Besides making sure that a planner always terminates and that if returns 
a plan that plan is a solution to the input problem, it is desirable that the 
planner returns a solution for all problems, where a solution exists: 

Definition 2.3.4 (Completeness). A search algorithm is complete, if it 
finds a solution if such a solution exists. That is, if the algorithm terminates 
without a solution, then the planning problem has no solution. 

To proof completeness, it has to be shown that the planner only then 
terminates without a solution, if no solution exists for a problem. 

We will give proofs for correctness and completeness for our planner DPlan 
in chapter 3. 

Backward planning algorithms based on a linear strategy are incomplete. 
The Sussman anomaly is based on that incompleteness. Incompleteness of 
linear backward planning is discussed in section 2. 3. 4. 2. 
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2. 3. 3. 3 Optimality. When abstracting from different costs (such as time, 
energy use) of operator application, optimality of a plan is defined with re- 
spect to its length. Operator applications are assumed to have uniform costs 
(for example one unit per application) and the cost of a plan is equivalent to 
the number of actions it contains. 

Definition 2.3.5 (Optimality of a Plan). A plan is optimal if each other 
plan which is a solution for the given problem has equal or greater length. 

For operators involving different (positive) costs a plan is optimal if for 
each other plan which is a solution the sum of the costs of the actions in the 
plan is equal or higher. 

For uninformed planning based on depth-first search, as described in sec- 
tion 2.3.2, optimality of plans cannot be guaranteed. In fact, there is a trade- 
off between efficiency and optimality ~ to generate optimal plans, the state- 
space has to be searched more exhaustively. The obvious search strategy 
for obtaining optimal plans is breadth-first search. Universal planning (see 
sect. 2.4.4) for deterministic domains results in optimal plans. 

2.3.4 Backward Planning 

Backward planning often results in smaller search trees than forward plan- 
ning because the top-level goals can be used to guide the search. For a long 
time planning was used as synonym for backward planning while forward 
planning was often associated with problem solving (see sect. 2. 4. 5.1). For 
backward planning, plan construction starts with the top-level goals and in 
each planning step a predecessor state is generated by backward operator 
application. Backward planning is also called regression planning. 

Before we go into the details of backward planning, we present the back- 
ward search tree for the tree for the example we already presented for forward 
search (see fig. 2.8). The backward search tree is slightly smaller than the 
forward search tree. Ignoring the states which were generated as backtrack- 
points, in forward search thirteen states have to be visited, in backward-search 
nine. In general, the savings can be much higher. 

There is a price to pay for obtaining smaller search trees. The problem 
is, that a backward operator application can produce an inconsistent state 
description (see def. 2.3.4). We will discuss this problem in section 3.3 in chap- 
ter 3, in the context of our state-based non-linear backward planner DPlan. 
In the following, we will discuss the classic (Strips) approach to backward 
planning, using goal regression together with the problem of detecting and 
eliminating inconsistent state descriptions. Furthermore, we will address the 
problem of incompleteness of classical linear backward planning and present 
non-linear planning, which is complete. 



10 



Alternatively depth-first search can be extended such that all plans are generated 
- always keeping the current shortest plan. 
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Fig. 2.8. A Backward Search Tree for Blocks-World 



2. 3.4.1 Goal Regression. Regression planning starts with the top-level 
goals of a planning problem as input (root of the search tree) . For a planning 
problem involving three blocks A, B, and C and the top-level goals Q = 
{on(A, B), on(B, C)}, the state depicted in the root node in figure 2.8 is 
the only legal goal state with Q C s = {on(A, B), on(B, C), ontable(C), 
clear (A)}. In contrast to forward planning, planning does not start with a 
complete state description s but with a partial state description Q A planning 
step is realized by goal regression: 

Definition 2.3.6 (Goal Regression). Let G be a set of (goal) literals and 
o an instantiated operator. 

— If for an atom p G G holds p € ADD{o), then the goal corresponding to p 
can be replaeed by true which is equivalent to removing p from G. 
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~ If for an atom p € G holds p G DEL(o), then the goal corresponding to 
p is destroyed, that is, it must be replaced by false which is equivalent to 
replacing formula G by false. 

— If p G G is neither contained in ADD(o) nor in DEL(o), nothing is 
changed. 

If a formula is reduced to false, an inconsistent state description is de- 
tected. Another possibility to deal with inconsistency is to introduce domain 
axioms and deduce contradictions by theorem proving. We give the beginning 
of the search tree using goal regression in figure 2.9. A complete regression 
tree using the domain specification given in figure 2.4 is given in (Nilsson, 
1980, pp. 293-295). Using goal regression, there can occur free variables when 
introducing new subgoals. Typically, these variables can be instantiated if a 
subgoal expression is reached which matches with the initial state. Depend- 
ing on the domain, there might go a lot of effort in dealing with inconsistent 
states. If more complex operator specifications - such as conditional effects 
and all-quantified expressions - are allowed, the number of impossible states 
which are constructed and must be detected during plan construction can 
explode. 




Fig. 2.9. Goal-Regression for Blocks-World 



To summarize, there are the following differences between forward and 
backward planning: 

Complete vs. Partial State Descriptions. Forward planning starts with a com- 
plete state representation - the initial state - while backward planning 
starts with a partial state representation - the top-level goals (which 
might contain variables). 

Consistency of State Descriptions. A planning step in forward planning al- 
ways generates a consistent successor state. Soundness of forward plan- 
ning follows easily from the soundness of the planning steps. A planning 
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step in backward planning can result in an inconsistent state descrip- 
tion. In general, a planner might not detect all inconsistencies. To proof 
soundness, it must be shown, that if a plan is returned, all intermediate 
state representations on the path from the top-level goals to the initial 
state are consistent. 

Variable Bindings. In forward planning, all constructed state representations 
are fully instantiated. This is due to the way, in which planning operators 
are defined: Usually, all variables occurring in an operator are bound 
by the precondition. In backward planning, newly introduced subgoals 
(i. e., preconditions of an operator) might contain variables which are 
not bound by matching the current subgoals with an operator. 

In chapter 3 we will introduce a backward planning strategy based on 
complete state descriptions: Planning starts with a goal state instead with 
the top-level goals. An operator is applicable if all elements of its ADD-list 
match with the current state. Operator application is performed by removing 
all elements of the ADD-list from the current descriptions, i. e., all literals 
are reduced to true in one step, and by adding the union of preconditions and 
DEL-list (see def. 2.1.7). Since usually the elements of the DEL-list are a sub- 
set of the elements of the preconditions, this state-based backward operator 
application can be seen as a special kind of goal regression. 

A further difference between forward and backward planning is: 

Goal Ordering. In forward planning, it must be decided which of the action 
candidates whose preconditions are satisfied in the current state is ap- 
plied. In backward planning, it must be decided which (sub-)goal of a 
list of goals is considered next. That is, backward planning involves goal 
ordering. 

In the following, we will show that the original linear strategy for back- 
ward planning is incomplete. 

2. 3. 4. 2 Incompleteness of Linear Planning. A planning strategy is 
called linear if it does not allow interleaving of sub-goals. That means, plan 
construction is based on the assumption that a problem can be solved by 
solving each goal separately (Sacerdoti, 1975). This assumption does only 
hold for independent goals - Nilsson (1980) uses the term “commutativity”, 
(Georgeff, 1987, see also). 

A famous demonstration for incompleteness of linear backward planning 
is the Sussman Anomaly (see Waldinger, 1977, for a discussion) illustrated in 
figure 2.10. Here, the linear strategy does not work, regardless in which order 
the sub-goals are approached. If B is put on C, A is covered by these two 
blocks and can only be moved, if the goal on(B, C) is destroyed. For reaching 
the goal on(A, B), first A has to be cleared by putting C on the table, if A 
is subsequently put on B, C cannot be moved under B without destroying 
goal on(A, B). 
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Fig. 2.10. The Sussman Anomaly 



Linear planning corresponds to dealing with goals organized in a stack: 

[on(A, B), on(B, C)] 
try to satisfy goal on(A, B) 

solve sub-goals [clear(A), clear(B)]^^ 
all sub-goals hold after puttable(C) 
apply put(A, B) 
goal on(A, B) \s reached 
try to satisfy goal on(B, C). 

Interleaving of goals - also called non-linear planning - allows that a 
sequence of planning steps dealing with one goal is interrupted to deal with 
another goal. For the Sussman Anomaly, that means that after block C is 
put on the table pursuing goal on(A, B), the planner switches to the goal 
on(B, C). Non-linear planning corresponds to dealing with goals organized 
in a set: 

{on(A, B), on(B, C)} 
try to satisfy goal on(A, B) 

{clear(A), clear(B), on(A, B), on(B, C)} 
clear (A) and clear (B) hold after puttable(C) 
try to satisfy goal on(B, C) 
apply put(B, C) 
try to satisfy goal on(A, B) 
apply put(A, B). 

The correct sequence of goals might not be found immediately but involve 
backtracking. 



We ignore the additional subgoal ontable(A) rsp. on(A, z) here. 



40 



2. State-Based Planning 



Another example to illustrate the incompleteness of linear planning was 
presented by Veloso and Carbonell (1993): Given are a rocket and several 
packages together with operators for loading and unloading packages in and 
from the rocket and an operator for shooting the rocket to the moon - but no 
operator for driving the rocket back from the moon. The planning goal is to 
transport some packages, for example at(PackA, Moon), at(PackB, Moon), 
at(PackC, Moon). If the goals are addressed in a linear way, one package 
would be loaded in the rocket, the rocket would go to the moon, and the 
package would be unloaded. The rest of the packages could never be delivered! 
The correct plan for this problem is, to load all packages in the rocket before 
the rocket moves to its destination. We will give a plan for the rocket domain 
in chapter 3 and we will show in chapter 8 how this strategy for solving 
problems of the rocket domain can be learned from some initial planning 
experience. 

A second source of incompleteness is, if a planner instantiates variables in 
an eager way - also called strong commitment planning. An illustration with a 
register-swapping problem is given in (pp. 305-307 Nilsson, 1980). A similar 
problem - sorting of arrays - and its solution is discussed in (Waldinger, 
1977). We will introduce sorting problems in chapter 3. Modern planners are 
based on a non-linear, least commitment strategy. 



2.4 Planning Systems 

In this section we give a short overview of the history of and recent trends 
in planning research. We will only give more detailled descriptions for such 
research areas which are relevant for the later parts of the book. Otherwise, 
we will only give short characterizations together with hints to the literature. 

2.4.1 Classical Approaches 

2. 4. 1.1 Strips Planning. The first well-known planning system was Strips 
(Tikes & Nilsson, 1971). It integrated concepts developed in the area of 
problem solving by state-space search and means-end analysis - as realized 
in the General Problem Solver (GPS) from Newell and Simon (1961) (see 
sect. 2. 4. 5.1) - and concepts from theorem proving and situation calculus - 
the QA3 system of Green (1969). As discussed above, the basic notions of 
the Strips language are still the core of modern planning languages, as for 
example PDDL. The Strips planning algorithm - based on goal regression 
and linear planning, as described above -, on the other hand, was replaced 
end of the eighties by non-linear, least-commitment approaches. In the sev- 
enties and eighties, lots of work addressed problems with the original Strips 
approach, for example (Waldinger, 1977; Lifschitz, 1987). 
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2. 4.1. 2 Deductive Planning. The deductive approach to planning as the- 
orem proving introduced by Green was pursued through the seventies and 
eighties mainly by Manna and Waldinger (Manna & Waldinger, 1987). Manna 
and Waldinger combined deductive planning and deductive program synthe- 
sis, and we will discuss their work in chapter 6. Current deductive approaches 
are based on so called action languages where actions are represented as tem- 
poral logic formulas. Here plan construction is a process of reasoning about 
change (Gelfond & Lifschitz, 1993). End of the nineties, symbolic model 
checking was introduced as an approach to planning as verification of tempo- 
ral formulas in an semantic model (Giunchiglia & Traverse, 1999). Symbolic 
model checking is mainly applied in universal planning for deterministic and 
non-deterministic domains (see sect. 2.4.4). 

2. 4.1. 3 Partial Order Planning. Also in the seventies, the first partial 
order planner (NOAH) was presented by Sacerdoti (1975).^^ Partial order 
planning is based on a search in the space of (incomplete) plans. Search 
starts with a plan containing only the initial state and the top-level goals. 
In each planning step, the plan is refined by either introducing an action 
fulfilling a goal or a precondition of another action, or by introducing an 
ordering that puts one action in front of another action, or by instantiating 
a previously unbound variable. Partial order planners are based on a non- 
linear strategy - in each planning step an arbitrary goal or precondition can 
be focussed. Furthermore, the least commitment strategy - i. e., refraining 
from committing to a specific ordering of planning steps or to a specific 
instantiation of a variable as long as possible - was introduced in the context 
of partial order planning (Penberthy & Weld, 1992). 

The resulting plan is usually not a totally, but only a partially ordered 
set of actions. Actions for which no ordering constraints occurred during 
plan construction remain unordered. A totally ordered plan can be extracted 
from the partially ordered plan by putting parallel (i. e., independent) steps 
in an arbitrary order. An overview of partial order planning together with 
a survey of the most important contributions to this research is given in 
(Russell & Norvig, 1995, chap. 11). Partial order planning was the dominant 
approach to planning from end of the eighties to mid of the nineties. The 
main contribution of partial order planning is the introduction of non-linear 
planning and least commitment. 

2. 4. 1.4 Total Order Non-linear Planning. Also at the end of the eight- 
ies, the Prodigy system was introduced (Veloso, Garbonell, Perez, Borrajo, 
Fink, & Blythe, 1995). Prodigy is a state-based, total-order planner based on 
a non-linear strategy. Prodigy is more a framework for planning than a single 
planning system. It allows the selection of a variety of planning strategies 



Sacerdoti called NOAH an hierarchical planner. Today, hierarchical plan- 
ning means that a plan is constructed on different levels of abstraction (see 
sect. 2. 4. 3.1). 
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and, more important, it allows that search is guided by domain-specific coir- 
trol knowledge - tuririirg a domaiir-iirdependeirt into a more efficient domaiir- 
specific plamrer. Prodigy includes different strategies for learniirg such control 
knowledge from experience (see sect. 2.5.2 and offers techniques for reusing 
already constructed plans in the context of new problems based on analogical 
reasoning. 

Two other total-order, non-linear backward planners are HSPr (Haslum 
& Geffner, 2000) and DPlan (Schmid & Wysotzki, 2000b) - which we will 
present in detail in chapter 3. 

2.4.2 Current Approaches 

2. 4. 2.1 Graphplan and Derivates. Since the mid of the nineties, a new, 
more efficient, generation of planning algorithms, dominates the field. The 
Graphplan approach preseirted by Blum and Frirst (1997) can be seen as 
the startiirg point of the irew developmeirt. Graphplan deals with plamring as 
iretwork flow problem (Gormen et ah, 1990). The process of plan constructioir 
is divided into two parts: first, a so called planning graph is constructed 
by forward search, second a partial order plan is extracted by backward 
search. Plan extraction from a planning graph of fixed sized corresponds 
to a bounded-length plan construction problem. 

The planning graph can be seen as a partial representation of the state- 
space of a problem, representing a sub-graph of the state-space which contains 
paths from the given initial state to the given planning goals. An example for 
a plannign graph is given in figure 2.11. It contains fully instantiated liter- 
als aird actions. But, in coirtrast to a state-space represeirtation, nodes in the 
graph are not states - i. e., coirjrmctions of literals - but siirgle literals (propo- 
sitions). Because the number of differeirt literals in a domaiir is considerably 
smaller than the number of different sub-sets of those literals (i. e., state rep- 
reseirtations) , the size of a plamring graph does irot grow exponentially (see 
sect. 2. 3. 3.1). 

The planning graph is organized level- wise. The first level is the set of 
propositions contained in the initial state, the next level is a set of actions, 
a proposition from the first level is connected with an action in the second 
level, if it is a precondition for this action. The next level contains proposi- 
tions again. For each actioir of the preceding level, all propositions coirtained 
iir its ADD-list are iirtroduced (and comrected with the action). The plair- 
iring graph coirtains so called noop actions at each level which just pass a 
literal from level k to level k + 2. Furthermore, so called mutex relations are 
iirtroduced at each level, representing (an incomplete set) of propositions or 
actions which are mutually exclusive on a given level. Two actions are mu- 
tex if they interfere (one action deletes a precondition or ADD-effect of the 
other) or have competing needs (have mutually exclusive preconditions). Two 
propositions p and q are mutex if each action having an add-edge to proposi- 
tion p is marked as mutex of each action having an add-edge to proposition 
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level 1 



Fig. 2.11. Part of a Planning Graph as Constructed by Graphplan 



q. Construction of a planning graph terminates, when the first time a level 
contains all literals occurring in the planning goal and these literals are not 
mutex. If backward plan extraction fails, the planning graph is extended one 
level. 

Originally, Graphplan was developed for the Strips planning language. 
Koehler et al. (1997) presented an extension to conditional and univer- 
sally quantified operator effects (system IPP). Another successful Graph- 
plan based system is STAN (Long & Fox, 1999). After Graphplan, a variety 
of so called compilation approaches have become popular. Starting with a 
planning graph, the bounded-length plan construction problem is addressed 
by different approaches to solving canonical combinatorial problems, such 
as satisfiability-solvers (Kautz & Selman, 1996, Blackbox), integer program- 
ming, or constraint satisfaction algorithms (see Kambhampati, 2000, for a 
survey). IPP, STAN, and Blackbox solve blocks-world problems up to 10 ob- 
jects in under a second and up to thirteen objects in under 1000 seconds 
(Bacchus et ah, 2000). 

Traditional planning algorithms worked for a (small) sub-set of first-order 
logic where in each planning step variables occurring in the operators must be 
instantiated. In the context of partial order planning, the expressive power of 
the original Strips approach was extended by concepts included in the PDDL 
language as discussed in section 2.2.1 (Penberthy & Weld, 1992). In contrast, 
compilation approaches are based on propositional logic. The reduction in the 
expressiveness of the underlying language gives rise to a gain in efficiency for 
plan construction. Another approach based on propositional representations 
is symbolic model checking which we will discuss in context with universal 
planning (see sect. 2.4.4). 

2. 4. 2. 2 Forward Planning Revisited. Another efficient planning system 
of the late nineties, is the system HSP from Bonet and Geffner (1999). HSP 
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is a forward planner based on heuristic search. The power of HSP is based 
on an efficient procedure for calculating a lower bound estimate h'(s) for 
the distance (remaining number of actions) from the current state s to a 
goal state. The heuristic function h'{s) is calculated for a “relaxed” problem 
P', which is obtained from the given planning problem P by ignoring the 
DEL-lists of the operators. Thus, h'{s) is set to the number of actions which 
transform s in a state where the goal literals appear for the first time - 
ignoring that preconditions of these actions might be deleted on the way. 

The new generation of forward planners dominated in the 2000 AIPS 
competition (Bacchus et al., 2000). For example, HSP can solve blocks-world 
problems up to 35 blocks in under 1000 seconds. 

2.4.3 Complex Domains and Uncertain Environments 

2.4.3. 1 Inclnding Domain Knowledge. Most realistic domains are by 
far more complex than the blocks-world domain discussed so far. Domain 
specifications for more realistic domains - such as assembling of machines or 
logistics problems - might involve large sets of (primitive) operators and/or 
huge state-spaces. The obvious approach to generate plans for complex do- 
mains is to restrict search by providing domain-specific knowledge. Planners 
relying on domain-specific knowledge can solve blocks-world problems with 
up to 95 blocks under one second (Bacchus et al., 2000). In contrast to the 
general purpose planners discussed so far, such knowledge-based systems are 
called special purpose planners. Note that all domain-specific approaches re- 
quire that more effort and time is invested in the development of a formalized 
domain model. Often, such knowledge is not easily to provide. 

One way to deal with a complex domain, is to specify plans at different 
levels of detail. For example (see Russell & Norvig, 1995, p. 368), to launch 
a rocket, a top-level plan might be: prepare booster rocket, prepare capsule, 
load cargo, launch. This plan can be differentiated to several intermediate- 
level plans until a plan containing executable actions is reached (on the detail 
of insert nut A into hole B, etc.). This approach is called hierarchical plan- 
ning. For hierarchical planning, additional to primitive operators which can 
be instantiated to executable actions, abstract operators must be specified. 
An abstract operator represents a decomposition of a problem into smaller 
problems. To specify an abstract operator, knowledge about the structure of 
a domain is necessary. An introduction to hierarchical planning by problem 
decomposition is given in (Russell & Norvig, 1995, chap. 12). 

Other techniques to make planning feasible for complex domains are con- 
cerned with reducing the effort of search. One source of complexity is that 
meaningless instantiations and inconsistent states might be generated which 
must be recognized and removed (see discussion in sect. 2.3.4). This problem 
can be reduced or eliminated if domain knowledge is provided in the form of 
axioms and types. A second source of complexity is that uninformed search 
might lead into areas of the state-space which are far away from a possible 
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solution. This problem can be reduced by providing domain specific control 
strategies which guide search. Planners that rely on such domain-specific 
control knowledge can easily outperform every domain-independent planner 
(Bacchus & Kabanza, 1996). 

Alternatively to explicitly providing a planner with such kind of domain- 
specific information, this information can be obtained by extracting informa- 
tion from domain specifications by pre-planning analysis (see sect. 2.5.1) or 
from some example plans using machine learning techniques (see sect. 2.5.2). 
2. 4. 3. 2 Planning for Non-deterministic Domains. Constructing plans 
for real-world domains must take into account that information might be 
incomplete or incorrect. For example, it might be unknown at planning time, 
whether the weather conditions are sunny or rainy when the rocket is to be 
launched (see above); or during planning time it is assumed that a certain tool 
is stored in a certain shelf, but at plan execution time, the tool might have 
been moved to another location. The first example addresses incompleteness 
of information due to environmental changes. The second example addresses 
incorrect information which might be due to environmental changes (e.g. , an 
agent which does not correspond to the agent executing the plan moved the 
tool) or to non-deterministic actions (e.g. , depending on where the planning 
agent moves after using the tool, he might place the tool back in the shelf or 
not). 

A classical approach to deal with incomplete information is conditional 
planning. A conditional plan is a disjunction of sub-plans for different contexts 
(as good or bad weather conditions). We will see in chapter 8 that introducing 
conditions into a plan is a necessary step for combining planning and program 
synthesis. A classical approach to deal with incorrect information is execution 
monitoring and re-planning. Plan execution is monitored and if a violation of 
the preconditions for the action which should be executed next is detected, 
re-planning is invoked. An introduction to both techniques is given in (Russell 
& Norvig, 1995, chap. 13). 

Another approach to deal with non-deterministic domains is reactive plan- 
ning. Here, a set or table of state-action rules is generated. Instead of exe- 
cuting a complete plan, the current state of the environment triggers which 
action is performed next, dependent on what conditions are fulfilled in the 
current state. This approach is also called policy learning and is extensively re- 
searched in the domain of reinforcement learning (Dean, Basye, & Shewchuk, 
1993; Sutton & Barto, 1998). A similar approach, combining problem solv- 
ing and decision tree learning, is proposed by (Muller & Wysotzki, 1995). 
In the context of symbolic planning, universal planning was proposed as an 
approach to policy learning (see sect. 2.4.4). 

2.4.4 Universal Planning 

Universal planning was originally proposed by Schoppers (1987) as an ap- 
proach to learn state-action rules for non-deterministic domains. A universal 
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plan represents solution paths for all possible states of a planning problem, 
instead of a solution for one single initial state. State-action rules are ex- 
tracted from a universal plan covering all possible states of a given planning 
problem. Generating universal plans instead of plans for a single initial state 
was also proposed by Wysotzki (1987) in the context of inductive program 
synthesis (see part II). 

A universal plan corresponds to a breadth-first search tree. Search is per- 
formed backward, starting with the top-level goals and for each node at the 
current level of the plan all (new and consistent) predecessor nodes are gen- 
erated. The set of predecessor nodes of a set of nodes S is also called pre- 
image of S. An abstract algorithm for universal plan construction is given 
in table 2.4. Universal planning was criticized as impracticable (Ginsberg, 
1989), because such search trees can grow exponentially (see discussion of 
PSPAGE-completeness, sect. 2. 3.3.1). Gurrently, universal planning has a re- 
naissance, due to the introduction of OBDDs (ordered binary decision dia- 
grams) as a method for a compact representation of universal plans. OBDD- 
representations were originally developed in the context of hardware design 
(Bryant, 1986) and later adopted in symbolic model checking for efficient 
exploration of large state-spaces (Burch, Glarke, McMillan, & Hwang, 1992). 
The new, memory efficient approaches to universal planning are based on 
the idea of viewing planning as model checking paradigm instead of planning 
as a state-space search problem (like Strips planners) or as theorem proving 
problem (like deductive planners). Planning as model checking is not only 
successfully applied to non-deterministic domains - planners MBP(Gimatti, 
Roveri, & Traverse, 1998) and UMOP (Jensen & Veloso, 2000) -, but also to 
benchmark problems for planning in deterministic domains (Edelkamp, 2000, 
planner MIPS). Because plan construction is based on breadth-first search, 
universal plans for deterministic domains represent optimal solutions. 

The general idea of planning as model checking is that a planning domain 
is described as semantic model of a domain. A semantic model can be given for 
example as a finite state machine (Gimatti et ah, 1998) or as Kripke structure 
(Giimchiglia & Traverse, 1999). A domain model D represents the states and 
actions of the domain and the state transitions caused by the execution of 
actions. States are represented as conjunctions of atoms. State transitions 
(state, action, state’) can be represented as formulas state A action — > state' . 
As usual, a planning problem is the problem of finding plans of actions given 
planning domain, initial and goal states. Plan generation is done by exploring 
the state space of the semantic model. At each step, plans are generated by 
checking the truth of some formulas in the model. Plans are also represented 
as formulas and planning is modeled as search through sets of states (instead 
of single states) by evaluating the assignments verifying the corresponding 
formulas. Giunchiglia and Traverso (1999) gives an overview of planning as 
model checking. 
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Table 2.4. Planning as Model Checking Algorithm (Giunchiglia, 1999, fig. 4) 
function Plan(P) where P(D, I, G) is a planning problem with 

— I = {so} as initial state, 

— G as goals, and 

— D = {F, S, A, R) as planning domain with F as set of fluents. S' C 2^ as finite 
set of states, A as finite set of actions, and i? : S x A — » S as transition function. 
An action a G A is executable in s G S if R{s, a) yf 0. 

CurrentStates := 0; NextStates := G; Plan := 0; 

while (NextStates yf CurrentStates) do (*) 
if I C NextStates then return Plan; (**) 

OneStepPlan := ONESxEPPLAN(NextStates,D); 

(calculate pre-image of NextStates) 

Plan := Plan U PRUNESTATES(OneStepPlan, NextStates); 

(eliminate states which have already been visited) 

CurrentStates := NextStates; 

NextStates := NextStates U PROJECTACTlONS(OneStepPlan); 

(ProjectActions, given a set of state-action pairs, returns the corre- 
sponding set of states) 

return Fail. 



As mentioned above, plans (as semantic models) can be represented com- 
pactly as OBDDs. An OBDD is a canonical representation of a boolean func- 
tion as a DAG (directed acyclic graph) . An example for an OBDD represen- 
tation is given in figure 2.12. Solid lines represent that the preceding variable 
is true, broken lines represent that it is false. Note, that the size of an OBDD 
is highly dependent on the ordering of variables (Bryant, 1986). 




Fig. 2.12. Representation of the Boolean Formula f{xi,X 2 ) = xi A X 2 as OBDD 



Our planning system DPlan is a universal planner for deterministic do- 
mains (see chapt. 3). In contrast to the algorithm given in table 2.4, plan- 
ning problems in DPlan only give planning goals, but not an initial state. 
DPlan terminates, if all states which are reachable from the top-level goals 
(by calculating the pre-images) are enumerated. That is, DPlan terminates 
successfully, if condition (*) given in table 2.4 is no longer fulfilled and DPlan 
does not include condition (**). Furthermore, in DPlan the universal plan is 
represented as DAG over states and not as OBDD. Therefore, DPlan is mem- 
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ory inefficient. But, as we will discuss in later chapters, DPlan is typically 
applied only to planning problems with small complexity (involving not more 
than three or four objects). The universal plan is used as starting point for 
inducing a domain specific control program for generating (optimal) action 
sequences for problems with arbitrary complexity (see sect. 2. 5. 2). 

2.4.5 Planning and Related Fields 

2. 4. 5.1 Planning and Problem Solving. The distinction between plan- 
ning and problem solving is not very clear-cut. In the following, we give some 
discriminations found in the literature (Russell & Norvig, 1995, e. g.,): Often, 
forward search based complete state representations is classified as problem 
solving, while backward search based on incomplete state representations is 
classified as planning. While planning is based on a logical representation 
of states, problem solving can rely on different, special purpose representa- 
tions, for example feature vectors. While planning is mostly associated with a 
domain-independent approach, problem solving typically relies on pre-defined 
domain specific knowledge. Examples are heuristic functions as used to guide 
A* search (Nilsson, 1971, 1980) or the difference table used in GPS (Newell 
& Simon, 1961; Nilsson, 1980). 

Cognitive psychology is typically concerned with human problem solving 
and not with human planning. Computer models of human problem solving 
are mostly realized as production systems (Mayer, 1983; Anderson, 1995). A 
production system consists of an interpreter and a set of production rules. 
The interpreter realizes the match-select-apply cycles. A production rule is 
an IF-THEN-rule. A rule fires if the condition specified in its if-part is sat- 
isfied in a current state in working memory. Selection might be influenced 
by the number of times a rule was already applied successfully - coded as 
a strength value. Application of a production rule results in a state change. 
Production rules are similar to operators. But while an operator is specified 
by preconditions and effects represented as sets of literals, production rules 
can encode conditions and effects differently. For example, a condition might 
represent a sequence of symbols which must occur in the current state and 
the then-part on the rule might give a sequence of symbols which are ap- 
pended to the current state representation. In cognitive science, production 
systems are usually goal-directed - the if-part of a rule represents a currently 
open goal and the then-part either introduces new sub-goals or specifies an 
action. Goal-driven production systems are similar to hierarchical planners. 

The earliest and most influential approach to problem solving in AI and 
cognitive psychology is the General Problem Solver (GPS) from (Newell & 
Simon, 1961). GPS is very similar to Strips planning. The main difference 
is, that selection of a rule (operator) is guided by a difference table. The dif- 
ference table has to be provided explicitly for each application domain. For 
each rule, it is represented which difference between a current state and a goal 
state it can reduce. For example, a rule for put(A, B) fulfills the goal on(A, 
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B). The process of identifying differences and selecting the appropriate rule 
is called means-end analysis. The system starts with the top-level goals and a 
current state. It selects one of the top-level goals, if this goal is already satis- 
fied in the current state, the system proceeds with the next goal. Otherwise, 
a rule is selected from the difference table. The top-level goal is removed from 
the list (stack) of goals and replaced by the preconditions of the rule and so 
on. As Strips, GPS is based on a linear strategy and is therefore incomplete. 
While incompleteness is not a desirable property for an AI program, it might 
be appropriate to characterize human problem solving. A human problem 
solver usually wants to generate a solution within reasonable time-bounds 
and can furthermore only hold a restricted amount of information in his/her 
short-term memory. Therefore, using an incomplete but simple strategy is 
rational because it still can work for a large class of problems occurring in 
everyday life (Simon, 1958). 

2. 4. 5. 2 Planning and Scheduling. Planning deals with finding such ac- 
tivities which must be performed to satisfy a given goal. Scheduling deals with 
finding an (optimal) allocation of activities (jobs) to time segments given lim- 
ited resources (Zweben & Fox, 1994). For a long time planning and scheduling 
research were completely separated. Only since the last years, the interests 
of both communities converge. While scheduling systems are successfully ap- 
plied in many areas (such as factories and transportation companies), plan- 
ning is mostly done “by hand” . Currently, researchers and companies become 
more interested in automatizing the planning part. This goal seems now far 
more realistic than five years ago, due to the emergence of new, efficient plan- 
ning algorithms. On the other hand, researchers in planning aim at applying 
their approaches at realistic domains ~ such as the logistics domain and the 
elevator domain used as benchmark problems in the AIPS-00 planning com- 
petition (Bacchus et ah, 2000).^^ Planning in realistic domains often involves 
dealing with resource constraints. One approach, for example proposed by 
Do and Kambhampati (2000), is to model planning and scheduling as two 
succeeding phases, both solved with a constraint satisfaction approach. Other 
work aims at integrating resource constraints in plan construction (Koehler, 
1998; Geffner, 2000). In chapter 4 we present an approach to integrating 
function application in planning and example applications to planning with 
resource constraints. 

2. 4. 5. 3 Proof Planning. In the domain of automatic theorem proving, 
planning is applied to guide proof construction. Proof planning was origi- 
nally proposed by Bundy (1988). Planning operators represent proof methods 
with preconditions, postconditions, and tactics. A tactic represents a number 
of inference steps. It can be applied if the preconditions hold and the post- 
conditions must be guaranteed to hold after the inference steps are executed. 

Because of the converging interests in planning and scheduling research, the 

conference Artificial Intelligence Planning Systems was renamed iirto Artificial 

Intelligence Planning and Scheduling. 
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A tactic guides the search for a proof by prescribing that certain inference 
steps should be executed in a certain sequence given some current step in a 
proof. 

An example for a high-level proof method is “proof by mathematical 
induction”. Bundy (1988) introduces a heuristics called “rippling” to describe 
the heuristics underlying the Boyer-Moore theorem prover and represents this 
heuristics as a method. Such a method specifies tactics on different levels of 
detail - similar to abstract and primitive operators in hierarchical planning. 
Thus, a “super-method” for guiding the construction of a complete proof 
invokes several sub-methods which satisfy the post-condition of the super- 
method after their execution. 

Proof planning, like hierarchical planning, requires insight in the struc- 
ture of the planning domain. Identifying and formalizing such knowledge 
can be very time-consuming, error-prone, or even impossible, as discussed in 
section 2.4.3. 1. Again, learning might be a good alternative to explicitly spec- 
ifying such domain-dependent knowledge: First, proofs are generated using 
only primitive tactics. Such proofs can then be input in a machine learn- 
ing algorithm and generalized to more general methods (Jamnik, Kerber, & 
Benzmiiller, 2000). In principle, all strategies available for learning control 
rules or control programs for planning - as discussed in section 2.5.2 - can be 
applied to proof planning. Of course, such a learned method might be incom- 
plete or non-optimal because it is induced from some specific examples. But 
the same is true for methods which are generated “by hand” . In both cases, 
such strategic decisions should not be executed automatically. Instead, they 
can be offered to a system user and accepted or reject by user interaction. 

2.4.6 Planning Literature 

Classical Strips planning is described in all introductory AI textbooks - such 
as Nilsson (1980) and Winston (1992). The newest textbook from (Russell & 
Norvig, 1995) focusses on partial order planning, which is also described in 
Winston (1992). Short introductions to situation calculus can also be found 
in all textbooks. A collection of influential papers in planning research from 
the beginning until the end of the eighties is presented by Allen, Hendler, 
and (Eds.) (1990). 

The area of planning research underwent significant changes since the 
mid-nineties. The newer Graphplan based and compilation approaches as well 
as methods of domain-knowledge learning (see sect. 2.5) are not described in 
textbooks. Good sources to obtain an overview of current research are the pro- 
ceedings of the AIPS conference (Artificial Intelligence Planning and Schedul- 
ing; http://www-aig.jpl.nasa.gOv/public/aips00/) and the Journal of 
Artificial Intelligence Research (JAIR; http://www.cs.washington.edu/ 
research/j air/home .html). Overview papers of special areas of planning 
can be found in the AI Magazine - for example, a recent overview of current 
planning approaches by Weld (1999). Furthermore, research in planning is 
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presented in all major AI conferences (IJCAI, AAAI) and AI journals (e.g. , 
Artificial Intelligence) . A collection of current planning systems together with 
the possibility to execute planners on a variety of problems can be found at 
http : //rukbat . f okus . gmd.de : 8080/.^^ 



2.5 Automatic Knowledge Acquisition for Planning 

2.5.1 Pre-planning Analysis 

Pre-planning analysis was proposed for example by (Fox & Long, 1998) to 
eliminate meaningless instantiations when construction a propositional rep- 
resentation of a planning problem, as for example a planning graph. The 
basic idea is, to automatically infer types from domain specifications. These 
types then are used to extract state invariants. Additionally to eliminating 
meaningless instantiations state invariants can be used to detect and elim- 
inate unsound plans. The domain analysis of the system TIM proposed by 
Fox and Long (1998) extracts information by analyzing the literals occurring 
in the preconditions, ADD- and DEL-lists of operators (transforming them 
in a set of finite state machines) . 

For example, for the rocket domain (shortly described in sect. 2. 3. 4. 2), it 
can be inferred, that each package is always either in the rocket or at a fixed 
place, but never at two locations simultaneously. 

Extracting knowledge from domain specifications makes planning more 
efficient because it reduces the search space by eliminating impossible states. 
Another possibility to make plan construction more efficient is to guide search 
by introducing domain specific control knowledge. 

2.5.2 Planning and Learning 

Enriching domains with control knowledge which guides a planner to reduce 
search for a solution makes planning more efficient and therefore makes a 
larger class of problems solvable under given time and memory restrictions. 
Because it is not always easy to provide such knowledge and because domain 
modeling is an “art” which is time consuming and error-prone, one area of 
planning research deals with the development of approaches to learn such 
knowledge automatically from some sample experience with a domain. 

2. 5. 2.1 Linear Macro-Operators. In the eighties, approaches to learning 
(linear) macro operators were investigated (Minton, 1985; Korf, 1985): After a 
plan is constructed for a problem of a given domain, operators which appear 
directly after each other in the plan can be composed to a single macro 
operator by merging their preconditions and effects. This process can be 

This website was realized by Jurgen Miiller as part of his diploma thesis at TU 
Berlin, supervised by Ute Schmid and Fritz Wysotzki. 
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applied to pairs or larger sequences of primitive operators. If the planner 
is confronted with a new problem of the given domain, plan construction 
might involve a smaller number of match-select-apply cycles because macro 
operators can be applied which generate larger segments of the searched 
for plan in one step. This approach to learning in planning, however, did 
not succeed, mainly because of the so called utility problem. If macros are 
extracted undiscriminated from a plan, the system might become “swamped” 
with macro-operators. Possible efficiency gains from reduction of the number 
of match-select-apply cycles are counterbalanced or even overridden by the 
number of operators which must be matched. 

A similar approach to linear macro learning is investigated in the context 
of cognitive models of human problem solving (see sect. 2.4.5. l)and learning. 
The ACT system (Anderson, 1983) and its descendants (Anderson & Lebiere, 
1998) are realized as production systems with a declarative component rep- 
resenting factual knowledge and a procedural component representing skills. 
The procedural knowledge is represented by production rules {if condition 
then action pairs). Skill acquisition is modelled as “compilation” which in- 
cludes concatenating primitive rules. This mechanism is used to describe 
speed-up learning from problem solving experience. Another production sys- 
tem approach to human cognitive skills is the SOAR system (Newell, 1990), a 
descendant of GPS. Here, “chunking” of rules is invoked, if an impasse during 
generating a problem solution has occurred and was successfully resolved. 

2. 5. 2. 2 Learning Control Rules. From the late eighties to the mid of 
the nineties, control rule learning was mainly investigated in context of the 
Prodigy system (Veloso et al., 1995). A variety of approaches - mainly based 
on explanation-based learning (EBL) or generalization (Mitchell, Keller, & 
Kedar-Cabelli, 1986) ~ were investigated to learn control rules to improve the 
efficiency of plan construction and the quality (i. e., optimality) of plans. An 
overview of all investigated methods is given in Veloso et al. (1995). A control- 
rule is represented as production rule. For a current state achieved during 
plan construction, such a control rule provides the planner with information 
which choice (next action to select, next sub-goal to focus) it should make. 
The EBL-approach proposed by Minton (1988), extracts such control rules 
from an analysis of search trees by explaining why certain branching decisions 
were made during search for a solution. 

Another approach to control rule learning, based on learning decision 
trees (Quinlan, 1986) or decision lists (Rivest, 1987), is closely related to pol- 
icy learning in reinforcement learning (Sutton & Barto, 1998). For example, 
Briesemeister, Scheffer, and Wysotzki (1996) combined problem solving with 
decision tree learning. For a given problem solution, each state is represented 
as a feature vector and associated with the action which was executed in this 
state. A decision tree is induced, representing relevant features of the state 
description together with the action which has to be performed given specific 
values of these features. Each path in the decision tree leads to an action (or 
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a “don’t know what to do” ) in its leaf and can be seen as a condition-action 
rule. After a decision tree is learned, new problem solving episodes can be 
guided by the information contained in the decision tree. Learning can be per- 
formed incrementally, leading either to instantiations of up to now unknown 
condition-action pairs or to a restructuring of the tree. Similar approaches 
are proposed by (Martin & Geffner, 2000) and (Huang, Selman, & Kautz, 
2000). Martin and Geffner (2000) learn policies represented in a concept lan- 
guage (i. e., as logical formulas) with a decision list approach. They can show 
that a larger percentage of complex problems (such as blocks-world problems 
involving 20 blocks) can be successfully solved using the learned policies and 
that the generated plans are - while not optimal - reasonably short. 

2. 5. 2. 3 Learning Control Programs. An alternative approach to learn- 
ing “molecular” rules is learning of control programs. A control program gen- 
erates a possibly cyclic (recursive, iterative) sequence of actions. We will also 
speak of (recursive) control rules when refering to control programs. Shell 
and Garbonell (1989) contrast iterative with linear macros and give a the- 
oretical analysis and some empirical demonstrations of the efficiency gains 
which can be expected using iterative macros. Iterative macros can be seen 
as programs because they provide a control structure for repeatedly execut- 
ing a sequence of actions until the condition for looping does no longer hold. 
The authors point out that a control program represents strategic knowledge 
for a domain. Efficient human problem solving should not only be explained 
by speed-up effects due to operator-merging but also by acquiring problem 
solving strategies - i. e., knowledge on a higher level of abstraction. 

Learning control programs from some initial planning experience brings 
together inductive program synthesis and planning research which is the main 
topic of the work presented in this book. We will present our approach to 
learning control programs by synthesizing functional programs in detail in 
part II. Other approaches were presented by (Shavlik, 1990), who applies 
inductive logic programming to control program learning, and by (Koza, 
1992) in the context of genetic programming. Recently, learning recursive 
( “open loop” ) macros is also investigated in reinforcement learning (Kalmar 
& Szepesvari, 1999; Sun & Sessions, 1999). 

While control rules guide search for a plan, control programs eliminate 
search completely. In our approach, we learn recursive functions for a domain 
of arbitrary complexity but with a fixed goal. For example, a program for con- 
structing a tower of sorted blocks can be synthesized from a (universal) plan 
for a three block problem with goal {on(A, B), on(B, G)}. The synthesized 
function then can generate correct and optimal action sequences for tower 
problems with an arbitrary number of blocks. Instead of searching for a plan 
needing probably exponential effort, the learned program is executed. Execu- 
tion time corresponds to the complexity class of the program - for example 
linear time for a linear recursion. Furthermore, the learned control programs 
provide optimal action sequences. 
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While learning a control program for achieving a certain kind of top-level 
goals in a domain results in a highly efficient generation of an optimal action 
sequence, this might not be true if control knowledge is only learned for a sub- 
domain “ as for example, clearing a block as subproblem to building a tower of 
blocks. In this case, the problem of intelligent indexing and retrieval of control 
programs has to be dealt with - similar as in reuse of already generated plans 
in the context of a new planning problem (Veloso, 1994; Nebel & Koehler, 
1995). Furthermore, program execution might lead to a state which is not 
on the optimal path of the global problem solution (Kalmar & Szepesvari, 
1999). 



3. Constructing Complete Sets 
of Optimal Plans 



The planning system DPlan is designed as a tool to support the first step 
of inductive program synthesis - generating finite programs for transforming 
input examples into the desired output. Because our work is in the context 
of program synthesis, planning is for small, deterministic domains and com- 
pleteness and optimality are of more concern than efficiency considerations. 
In the remaining chapters of part I, we present DPlan as planning system. 
In part II we will describe, how recursive functions can be induced from 
plans constructed with DPlan and show how such recursive functions can be 
used as control programs for plan construction. In this chapter, we will in- 
troduce DPlan (sect. 3.1), introduce universal plans as sets of optimal plans 
(sect. 3.2), and give proofs for termination, soundness and completeness of 
DPlan (sect. 3.3).^ An extension of DPlan to function application - allowing 
planning for infinite domains - is described in the next chapter (chap. 4). 



3.1 Introduction to DPlan 

DPlan^ is a state-based, non-linear, total-order backward planner. DPlan is 
a universal planner (see sect. 2.4.4 in chap. 2): Instead of a plan representing 
a sequence of actions transforming a single initial state into a state fulfilling 
the top-level goals, DPlan constructs a plan, representing optimal action se- 
quences for all states belonging to the planning problem. A short history of 
DPlan is given in appendix AA.l. 

DPlan differs from standard universal planning in the following aspects: 

— In contrast to universal planning for non-deterministic domains, we do not 
extract state-action rules from the universal plan, but use it as starting- 
point for program synthesis. 

~ A planning problem specifies a set of top-level goals and it restricts the 
number of objects of the domain, but no initial state is given. We are 

^ The chapter is based on the following previous publications: Schmid (1999), 
Schmid and Wysotzki (2000b, 2000a). 

^ Our algorithm is named DPlan in reference to the Dijkstra-algorithm (Cor- 
men et al., 1990), because it is a single-source shortest-paths algorithm with 
the state(s) fulfilling the top-level goals as source. 
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interested in optimal transformation sequences for all possible states from 
which the goal is reachable. 

— A universal plan represents the optimal solutions for all states which can 
be transformed into a state fulfilling the top-level goal. That is, plan con- 
struction does not terminate if a given initial state is included in the plan, 
but only if expansion of the current leaf nodes does not result in new states 
(which are not already contained in the plan). 

— A universal plan is represented as “minimal spanning DAG” (directed 
acyclic graph) (see sect. 3.2) with nodes as states and arcs as actions.^ 
Instead of a memory-efficient representation using OBDDs, an explicit 
state-based representation is used as starting-point for program synthe- 
sis. Typically, plans involving only three or four objects are generated and 
a control program for dealing with n objects is generalized by program 
synthesis. 

— Backward operator application is not realized by goal regression over par- 
tial state descriptions (see sect. 2. 3. 4.1) but over complete state descrip- 
tions. The first step of plan construction expands the top-level goals to a 
set of goal states. 

3.1.1 DPlan Planning Language 

The current system is based on domain and problem specifications in PDDL 
syntax (see sect. 2.2.1 in chap. 2), allowing Strips (see def. 2.1.1) and oper- 
ators with conditional effects. As usual, states are defined as sets of atoms 
(def. 2.1.2) and goals as sets of literals (def. 2.1.4). Operators are defined 
with preconditions, ADD- and DEL-lists (def. 2.1.5). Secondary precondi- 
tions can be introduced to model context-dependent effects (see sect. 2. 2. 1.2 
in chap. 2). 

Forward operator application for executing a plan by transforming an 
initial state into a goal state is defined as usual (see def. 2.1.6). Backward 
operator application is defined for complete state descriptions (see def. 2.1.7). 
An extension for conditional effects is given in definition 2.2.1. For universal 
planning, backward operator application must be extended to calculate the 
set of all predecessor states for a given state: 

Definition 3.1.1 (Pre-Image). With Res~^{o, s) we denote backward ap- 
plication of an instantiated operator o to a state description s (see def. 2.1.7). 
We call s' = Res~^{o, s) a predecessor of s. Backward operator application 
can be extended in the following way: 

— Res~^{{oi . . .o„},s) represents the “parallel” application of the set of all 
actions which satisfy the application condition for state s resulting in a set 
of predecessor states. 

^ As we will see for some examples below, for some planning domains, the universal 
plan is a specialized DAG - a minimal spanning tree or simply a sequence. 
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~ The pre-image of a set of states S = {si, . . . , s;} is defined as 
Ui=i Res~^{{oi . . .o„},Si). 

To make sure that plan construction based on calculating pre-images is 
sound and complete, it must be shown that the pre-image of a set of states S 
only contains consistent state descriptions and that the pre-image contains all 
state descriptions which can be transformed into a state in S (see sect. 3.3). 
A DPlan planning problem is defined as 

Definition 3.1.2 (DPlan Planning Problem). A planning problem 
V{0,Q ,V) consists of a set of operators O, a set of top-level goals Q, and a 
domain restriction T> which can be specified in two ways: 

— as set of state descriptions, 

— as a set of goal states. 

In standard planning and universal planning, an initial state is given as 
part of a planning problem. A planning problem is solved if an action sequence 
transforming the initial state into a state satisfying the top-level goals is 
found. In DPlan planning, a planning problem is solved if all states given in 
V are included in the universal plan, or if a set of goal states is expanded to 
all possible states from which a goal state is reachable. 

The initial state has the following additional functions in backward plan 
construction (see sect. 2.3.4 in chap. 2): Plan construction terminates, if the 
initial state is included in the search tree. All partial state descriptions on the 
path from the initial state to the top-level goals can be completed by forward 
operator application. Consistency of state descriptions can be checked for the 
states included in the solution path and inconsistencies on other paths have 
no influence on the soundness of the plan. To make backward planning sound 
and complete, in general at least one complete state description - denoting 
a set of consistent relations between all objects of interest - is necessary. 
For DPlan, this must be either a (set of) complete goal state(s) or the set 
of all states to be included in the universal plan. If for a DPlan planning 
problem V is given as a set of goal states, consistency of a predecessor state 
can be checked by i?es“^(o, s) = s' — > i?es(o, s') = s. If T> is given as a 
set of (consistent) state description, plan construction is reduced to stepwise 
including states from T> in the plan. 

3.1.2 DPlan Algorithm 

The DPlan algorithm is a variant of the universal planning algorithm given 
in table 2.4. In table 3.1, the DPlan algorithm is described abstractly. Input 
is a DPlan planning problem P{0,Q,T>) with a set of operators O, a set of 
top-level goals Q and a domain restriction T> as specified in definition 3.1.2. 
Output is a universal plan, representing the union of all optimal plans for the 
restricted domain and a fixed goal, which we will describe in detail in section 
3.2. In the following, the functions used for plan construction are described. 
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Table 3.1. Abstract DPlan Algorithm 

function DPlan(P) where P is a DPlan planning problem. 

CurrentStates := 0; 

NextStates ;= GoalStates(P); 

Plan := 0; 

while (NextStates 7 ^ CurrentStates) do 

OneStepPlan := ONESTEPPLAN(NextStates,P); 

Plan := Plan U PRUNESTATES(OneStepPlan, NextStates); 
CurrentStates := NextStates; 

NextStates := NextStates U PROJECTACTlONS(OneStepPlan); 

return Plan. 



~ GoalStates(P): 

— If is defined as set of goal states, 

GoalStates(P) := V. 

— If I? is defined as set of states S, 

GoalStates(P) := {sG I 0 C SG and sq S S}. 

- ONESTEPPLAN(States, P): 

Calculates the pre-image of States as described in definition 3.1.1. Each 
state s G States is expanded to all states s' with Res{o, s') = s. States 
s' are calculated by backward operator application Res~^{o, s) = s' where 
all states for which Res{o, s') = s does not hold are removed. If T> is given 
as a set of states, additionally, all states not occurring in T) are removed. 
OneStepPlan returns all pairs (o, s'): 

ONESTEPPLAN(States, P) = {(o, s') I Res{o,s) = s' A s G States}. 

- PRUNESTATES(Pairs, States): 

Eliminates all pairs (o, s') from Pairs which have already been visited: 
PruneStates ('Pairs, States) := {(o, s') G Pairs with s' ^ States}. 

- PROJECTAcTiONS(Pairs): 

Returns the set of all states occurring in Pairs: 

PROJECxAcTiONSf’Pairs^ .'= {s' | (o, s') G Pairs}. 

Plan construction corresponds to building a search-tree with breadth- 
first search with filtering of states which already occur on higher levels of 
the tree. Since it is possible that a OneStepPlan contains pairs (o, s'), (o', s') 

- i. e., different actions resulting in the same predecessor state - the plan 
corresponds not to a tree but to a DAG (Christofides, 1975). 

3.1.3 Efficiency Concerns 

Since planning is an NP-complete or even PSPACE complete problem (see 
sect. 2.3.3. 1 in chap. 2), the worst-case performance of every possible algo- 
rithm is exponential. Nevertheless, current Graphplan-based and compilation 
approaches (see sect. 2.4.2. 1 in chap. 2) show good performance for many 
benchmark problems. These algorithms are based on depth-first search and 
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therefore in general cannot guarantee optimality - i. e., plans with a mini- 
mal length sequence of actions (Koehler et ah, 1997). In contrast, universal 
planning is based on breadth-first search and can guarantee optimality (for 
deterministic domains). The main disadvantage of universal planning is that 
plan size grows exponentially for many domains (see sect. 2. 3. 3.1 in chap. 2). 
Encoding plans as OBDDs can often result in compact representations. But, 
the size of an OBDD is dependent on variable ordering (see sect. 2.4.4 in 
chap. 2) and calculating an optimal variable ordering is itself an NP-hard 
problem (Bryant, 1986). 

Planning with DPlan is neither time nor memory efficient. Plan construc- 
tion is based on breadth-first search and therefore works without backtrack- 
ing. Because paths to nodes which are already covered (by shorter paths) in 
the plan are not expanded"^, the effort of plan construction is dependent on 
the number of states. The number of states can grow exponentially as shown 
for example for the blocks-world domain in section 2. 3. 3.1 in chapter 2. Plans 
are represented as DAGs with state descriptions as nodes - that is, the size 
of the plan depends on the number of states, too.® 

DPlan incorporates none of the state-of-the art techniques for efficient 
plan construction: we do use no pre-planning analysis (see sect. 2.5.1) and we 
do not encode plans as OBDDs. The reason is that we are mainly interested 
in an explicit representation of the structure of a domain with a fixed goal and 
a small number of domain objects. A universal plan constructed with DPlan 
represents the structural relations between optimal transformation sequences 
for all possible states. This information is necessary for inducing a recursive 
control program, generalizing over the number of domain objects as we will 
describe in chapter 8. 

3.1.4 Example Problems 

In the following, we will give some example domains together with a DPlan 
problem, that is, a fixed goal and a fixed number of objects. For each DPlan 
problem, we present the universal plans constructed by DPlan. For better 
readability, we abstract somewhat from the LISP-based PDDL encoding 
which is input in the DPlan system (see appendix AA.3). 

3. 1.4.1 The Clearblock Problem. First we look at a very simple sub- 
domain of blocks-world (see fig. 3.1):. There is only one operator - puttable - 
which puts a block x from another block on the table. We give the restricted 
domain Dom as a set of three states - a tower of A, B, C, a tower of B, C, 
and block A lying on the table, and a state where all three blocks are lying 
on the table. 

similar to dynamic programming in A*, (Nilsson, 1980) 

® For example, calculating an universal plan with the uncompiled Lisp implemen- 
tation of DPlan, needs about a second for blocks-world with three objects, about 

150 seconds for four objects, and already more than an hour for 5 objects. 
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Operator: 


puttable(x) 


PRE: 


{clear(x), on(x, y)} 


ADD: 


{ontable(x), clear(y)} 


DEL: 


{on(x, y)} 


Goal: 


{clear(C)} 


Dom-S: 


{{on(A, B), on(B, C), clear(A), ontable(C)}, 

{on(B, C), clear(A), clear(B), ontable(A), ontable(C)}, 

{clear(A), clear(B), clear(C), ontable(A), ontable(B), ontable(C)}} 



Fig. 3.1. The Clearblock DPlan Problem 



The planning goal is to clear block C. Dom includes one state fulfilling the 
goal and this state becomes the root of the plan. Plan construction results 
in ordering the three states given in Dom with respect to the length of the 
optimal transformation sequence (see fig. 3.2). For this simple problem, the 
plan is a simple sequence, i. e., the states are totally ordered with respect to 
the given goal and the given operator. Note that we present plans with states 
and actions in Lisp notation - because we use the original DPlan output (see 
appendix AA.2). 



((clear A) (clear B) (clear C) (ontable A) (ontable B) (ontable C)) 



(PUTTXbLE B) 



((on B C) (clear A) (clear B) (ontable A) (ontable C)) 



(PUTTi^LE A) 



((on A B) (on B C) (clear A) (ontable C)) 



Fig. 3.2. DPlan Plan for Clearblock 



Alternatively, the set of all goal states satsifying clear(C) can be pre- 
sented. Then, the resulting plan (see fig. 3.3) is a forest. When planning for 
multiple goal-states, we introduce the top-level goals as root. 



AO (clear A) (clear B)) 



m BA) (on AC) (dear B)) 




(ontable omitted for better readability) 

Fig. 3.3. Clearblock with a Set of Goal States 
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3. 1.4.2 The Rocket Problem. An example for a simple transportation 
(logistics) domain is the rocket domain proposed by Veloso and Carbonell 
(1993) (see fig. 3.4). The planning goal is to transport three objects Ol, 02, 
and 03 from a place A to a destination B. The transport vehicle {Rocket) can 
only be moved in one direction {A to B), for example from the earth to the 
moon. Therefore, it is important to load all objects before the rocket moves 
to its destination. The rocket domain was introduced as a demonstration for 
the incompleteness of linear planning (see sect. 2. 3. 4. 2 in chap. 2). 



Operators: 

PRE: 

ADD: 

DEL: 

PRE: 

ADD: 

DEL: 

PRE: 

ADD: 

DEL: 



load(?o, ?1) 

{at(?o ?1), at(Rocket, ?!)} 
{inside(?o, Rocket)} 

{at(?o, ?1)} 
move- rocket() 

{at(Rocket, A)} 

{at(Rocket, B)} 

{at(Rocket, A)} 
unload(?o, ?1) 

{inside(?o, Rocket), at(Rocket, ?1)} 
{at(?o, ?1)} 

{inside(?o. Rocket)} 



Goal: 

Dom-G: 



{at(01, B), at(02, B), at(03, B)} 

{{at(01, B), at(02, B), at(03, B), at(Rocket, B)}} 



Fig. 3.4. The DPlan Rocket Problem 



The resulting universal plan is given in figure 3.5. For rocket, the plan 
corresponds to a DAG: loading and unloading of the three objects can be 
performed in an arbitrary sequence, but finally, each sequence results in all 
objects being inside the rocket or at the destination. 

3. 1.4.3 The Sorting Problem. A specification of a sorting problem is 
given in figure 3.6. The relational symbol isc(p k) represents that an element 
fc is at a position p of a list (or more exactly an array) . The relational symbol 
gt(x y) represents that an element x has a greater value than an element y, 
that is, the ordering on natural numbers is represented extensionally. Rela- 
tional symbol gt is static, i. e., its truth value is never changed by operator 
application. Because swapping of two elements is only restricted such that 
two elements are swapped if the first is greater than the second, the resulting 
plan corresponds to a set of traces of a selection sort program. We will discuss 
synthesis of selection sort in detail in chapter 8. 

The universal plan for sorting is given in figure 3.7. For better readability, 
the lists itself instead of their logical description are given. In the chapter 4 
we will describe how such lists can be manipulated directly by extending 
planning to function application. 
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((AT 01 B) (AT 02 B) (AT 03 B) (AT R B)> 




((IN 03 R) (IN 01 R) ON 02 R) (AT R B» 



(MOVE-ioCKBT) 



((AT R A) (IN 01 R) (IN 02 R) (IN 03) R) 




((AT 03 A) (AT 01 A) (AT 02 A) (AT R A)) 



(m = inside, R = rocket) 
Fig. 3.5. Universal Plan for Rocket 



Operator: 


swap(?p, ?q) 


PRE: 


{isc(?p, ?nl), isc(?q, ?n2), gt(?nl, ?n2)} 


ADD: 


{isc(?p, ?u2), isc(?q, ?nl)} 


DEL: 


{isc(?p, ?nl), isc(?q, ?n2)} 


Goal: 


{isc(pl, 1), isc(p2, 2), isc(p3, 3), isc(p4, 4)} 


Dom-G: 


{isc(pl, 1), isc(p2, 2), isc(p3, 3), isc(p4, 4) 

gt(4, 3), gt(4, 2), gt(4, 1), gt(3, 2), gt(3, 1), gt(2, 1)} 



Fig. 3.6. The DPlan Sorting Problem 



3. 1.4.4 The Hanoi Problem. A specification of a Tower of Hanoi problem 
is given in figure 3.8. The Tower of Hanoi problem (chap. 5 Winston & Horn, 
1989, e. g.,) is a well-researched puzzle. Given are n discs (in our case three) 
with monotonical increasing size and three pegs. The goal is to have a tower 
of discs on a fixed peg (in our case ps) where the discs are ordered by their 
size with the largest disc as base and the smallest as top element. Moving of a 
disc is restricted in the following way: Only a disc which has no other disc on 
its top can be moved. A disc can only moved onto a larger disc or an empty 
peg. Coming up with efficient algorithms (for restricted variants) of the Tower 
of Hanoi problem is still ongoing research (Atkinson, 1981; Pettorossi, 1984; 
Walsh, 1983; Allouche, 1994; Hinz, 1996). We will come back to this problem 
in chapter 8. 

The plan for hanoi is given in figure 3.9. Again, states are represented 
abbreviated. 
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Fig. 3.7. Universal Plan for Sorting 



Operator: 


move(?d, ?from, ?to) 


PRE: 


{on(?d, ?from), clear(?d), clear(?to), smaller(?to, ?d)} 


ADD: 


{on(?d, ?to), clear(?from)} 


DEL: 


{on(?d, ?from), clear(?to)} 


Goal: 


{on(d3, Pa), on(d2, da), on(di, ^2)} 


Dom-G: 


{on(da, Pa), on(d2, da), on(di, d,2), clear(di), clear(pi), clear(p2) 
smaller(pi, di), smaller(pi, d2), smaller(pi, da), 
smaller(p2, di), smaller(p2, d,2), smaller(p2, da), 
smaller(pa, di), smaller(pa, d,2), smaller(pa, da), 
smaller(di, 6,2), smaller(di, da), smaller(d2, da)} 



Fig. 3.8. The DPlan Hanoi Problem 



3. 1.4.5 The Tower Problem. The blocks-world domain with operators 
put and puttable was introduced in chapter 2 (see figs. 2.2, 2.3, 2.4, 2.5). We 
use a variant of the operator definitions given in figure 2.2 where the both 
variants of put are specified as a single operator with conditioned effect. A 
plan for the goal {on(A, B), on(B, C)} is given in figure 3.10. We omitted 
from representing the ontable relations for better readability. 
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[ 00 ( 123 )] 




[(3)(12)()1 

(M DiP3rj2) (Mrikps D2) 



[( 3 )( 2 )( 1 )] 
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[( 23 ) 0 ( 1 )] 
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[(13)p)] 

(M D2 P3 P2) 

[( 13 ) 0 ( 2 )] 

{M D1 P2^D3) (M d\ D2 D3) 



[( 23 )( 1 )()] [( 123 ) 00 ] [( 3 )( 1 )( 2 )] [( 3 ) 0 ( 12 )] 

Fig. 3.9. Universal Plan for Hanoi 



[(12K3)()] 

(M (Ml: 



[(2)(3)(1)] 

(M D2 D3P1) 

[()( 23 )( 1 )] 

(MDIP1/3) (M^D2P3) 



[( 2 )( 13 )()] 



(MD2 



P3 PI) 



[()( 13 )( 2 )] 

(MDlPb^S) (\^B1D2D3) 



[( 1 )( 23 )()] [ 0 ( 123 ) 0 ] [( 1 )( 3 )( 2 )] [()( 3 )( 12 )] 



((ON A B) (ON B C) (CT A)) 
(PUTAB) 




Fig. 3.10. Universal Plan for Tower 



3.2 Optimal Full Universal Plans 

A DPlan plan, as constructed with the algorithm reported in table 3.1, is 
a set of action-state pairs (o, s). For the goal state(s) it holds o = nil. For 
every other state contained in a DPlan plan it holds that Res{o, s) = s' (see 
sect. 3.1.1); that is, for each state s except the goal state(s), there exists at 
least one state s ^ s' which can be directly transformed into s' by application 
of an action o. Consequently, a DPlan plan constitutes a directed graph with 
states as nodes and actions as edges. Because a state which is already con- 
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tained in the plan is never introduced again on a later level (PruneStates 
in tab. 3.1), the resulting graph is also acyclic. 

Definition 3.2.1 (Universal Plan). A plan constructed with the DPLan 
algorithm given in table 3.1 constitutes a directed acyclic graph (DAG) U* = 
(S,R) with S = ProjectActions(5) as set of states and R as set of edges. 
The set of edges is given by the pre-images (see tab. 3.1): R C S x S = 
{(o, s') I Res{o,s) = s'}.® We call U* an universal plan. 

DPlan plan is constructed level wise, starting with the states fulfilling the 
top-level goals as root nodes and the plan contains no cyclic paths. Therefore, 
all paths leading from a fixed state to the root have equal length and each 
state can be assigned a uniquely determined natural number representing the 
number of action applications for transforming it into a goal state. 

Definition 3.2.2 (Universal Plan as Order over States). Each node 
s G S for which no predecessor Res{o, s) exists has a path-length g{s) = 0 
(s is a root node and s G_Q). 

— Each node s G S with Res{o, s) = s' and g{s') = i has a path-length 
g{s) = i -|- 1 (s is a node with s % Q). 

The path length g(s) G Af constitutes an order over the states in IT* with 
s < s' iff g{s) < g{s'). 

In contrast to standard universal planning, DPlan constructs a plan which 
contains all states which can be transformed into a state satisfying the given 
top-level goals: 

Definition 3.2.3 (Full Universal Plan). A universal plan IT* = {S,R) 
is called full with respect to given top-level goals Q , if for all states s 
for which exists a transformation sequence Res{o,s) = Si . . . Res{oi, Si) = 
Si-i . . . Res{oi, s\) = sq G_ Q holds s G S. 

For example, the universal plan for rocket (see fig. 3.5) defined for three 
objects and a single rocket contains transformation sequences for all legal 
states which can be defined over C = {Of, 02, 03, Rocket}. 

The notion of a full universal plan presupposes that planning with DPlan 
must be complete. Additionally, we require that the universal plan does not 
contain states which cannot be transformed into a goal state, that is, DPlan 
must be sound. Completeness and soundness of DPlan are discussed in section 
3.3. For proving optimality of DPlan we assume completeness: 

Lemma 3.2.1 (Universal Plans are Optimal). Eor a full universal plan 
n* each path from a state s to a root note contains the minimal number of 
edges, that is, the minimal number of actions necessary to transform s into 
a goal state. 



In contrast to the standard definition of labelled graphs (Christofides, 1975), we 
do not discriminate between nodes/edges and labels of nodes/edges. 
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Proof (Universal Plans are Optimal). We give an informal proof by induc- 
tion over the construction of 77* = (S,R) with the DPLan algorithm given 
in table 3.1: 

— Plan construction starts with a set of goal states Sg = {s \ G Q s A s G S} . 
Since the goal states constitute root nodes in 77* it holds g{s) = 0 for all 
s G Sg- 

— For all nodes in s with g{s) = i > 0, i gives the minimal number of actions 
to transform s into a goal state. It either holds that there exists a state s' 
with Res{o, s') = s or that no such state exists. In the second case, s is a 
leaf in 77*. In the first case, one of the following conditions hold: 

1. s' is already introduced at a higher level with g{s') < g{s): {o,s') will 
not be included in the plan (see PruneStates in tab. 3.1). 

2. s' is not contained at a higher level: (o, s') will be included in the plan 
and g{s') = i -\-l which is the minimal path-length for s'. 

The length of minimal paths in a graph constitutes a metric. That is, for 
all paths holds that if the transformation sequence from start to goal node 
of the path is optimal then the transformation sequences for all intermediate 
nodes on the path are optimal, too. 

As a consequence of the optimality of DPlan, each path in 77* from some 
state to a goal state represents an optimal solution for this state. In general, 
there might be multiple optimal paths from a state to the goal. For example, 
for rocket (see sect. 3. 1.4. 2) there are 3! different possible sequences for un- 
loading and loading three objects. For the leaf node in figure 3.5 there are 36 
different optimal solution paths contained in the universal plan. 

A DPlan plan can be reduced to a minimal spanning tree (Christofides, 
1975) by deleting edges such that each node which is not the root node has 
exactly one predecessor. The minimal spanning tree contains exactly one 
optimal solution path for each state. A minimal spanning tree for rocket is 
given in figure 3.11. 



3.3 Termination, Soundness, Completeness 

3.3.1 Termination of DPlan 

For finite domains, termination of the DPlan algorithm given in table 3.1 is 
guaranteed: 

Lemma 3.3.1 (Termination of DPlan). For finite domains, a situation 
(NextStates = CurrentStates) will be reached. 

This fact holds because the growth of the number of states in in plan con- 
struction is monotonical: 
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((AT 01 B) (AT 02 B) (AT 03 B) (AT R B)> 




((IN 03 R) (IN 01 R) ON 02 R) (AT R B» 



(MOVE-ioCKBT) 



((AT R A) (IN 01 R) (IN 02 R) (IN 03) R) 




((AT 03 A) (AT 01 A) (AT 02 A) (AT R A)) 



Fig. 3.11. Minimal Spanning Tree for Rocket 



Proof (Termination of D Plan). We denote a plan after the t-th iteration of 
pre-image calculation with Plant- From the definition of the DPlan algorithm 
follows that NextStates = PROJECxAcTiONS(^P/an/ 

~ If calculating a pre-image only returns state descriptions which are al- 
ready contained in plan Plant then (NextStates = CurrentStates) holds 
and DPlan terminates after t iterations. 

~ If calculating a pre-image returns at least one state description which is not 
already contained in the plan then this state description(s) are included in 
the plan and | Plant+i \ > \ Plant \ ■ 

— Because planning domains are assumed to be finite, there is only a fixed 
number of different state descriptions and a fixed point | Plant+i \ = 

I Plant I , i- e., a situation (NextStates = CurrentStates) will be reached 
after a finite number of steps t. 

3.3.2 Operator Restrictions 

Soundness and completeness of plan construction is determined by the sound- 
ness and completeness of operator application. State-based backward plan- 
ning has some inherent restrictions which we will describe in the following. We 
repeat the definitions for forward and backward operator application given 
in definitions 2.1.6 and 2.1.7: 



Res{o, s) = s \ DEL U ADD if PRE C s 
Res~^{o, s) = s \ ADD U {DEL U PRE) if ADD C s. 
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For forward application, \DEL and UADD are only commutative for ADDD 
DEL = 0’^. When literals from the DEL-list are deleted before adding literals 
from the ADD-list, as defined above, we guarantee that everything given in 
the ADD-list is really given for the new state. 

Completeness and soundness of backward-planning means that the fol- 
lowing diagram must commute: 



If Res~^{o, s) = s' results in s = Res{o,s') for all s, backward planning is 
sound. If for all s' with Res{o, s') = s it holds that Res~^{o, s) = s' , backward 
planning is complete. In the following we will proof that these propositions 
hold with some restrictions. 

Soundness: 

Res{o, Res~^{o, s)) = s 

Res{o, Res~^{o, s)) = 



= {s U {DEL U PRE}} \ DEL 

with PRE C {s \ ADD U {DEL U PRE}} 

= {s U DEL} \ DEL if PRE C s U DEL 
= s if snDEL = H}. 

The restriction PRE C s U DEL means for forward operator application, 
transforming from s' into s, that PRE holds after operator application if it is 
not explicitly deleted. In backward application PRE is introduced as list of 
sub-goals and constraints which have to hold in s' . Therefore, this restriction 
is unproblematic. The restriction sflDEL = 0 means for forward application 
that s contains no literal from the DEL-list. For backward-planning all liter- 
als from the DEL-list are added and s has contained none of these literals. 
This restriction is in accordance with the definition of legal operators. Thus, 



In PDDL ADD n DEL — 0 must be always true because the effects are given 
in a single list with DEL-effects as negated literals. Therefore, an expression 
(and p (not p) ) would represent a contradiction and cannot be not allowed as 
specification of an operator effect. 



i 

i?es“^(o, s) 



s 



Res{o, s') 

T 

J 



Res ^{o,s) \ DELU ADD 
with PRE C Res~^{o, s) 

{s \ ADD U {DEL U PRE}} \ DEL U ADD 
with PRE C {s \ ADD U {DEL U PRE}} 
and ADD C s 
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soundness of backward application of an Strips operator is given. In gen- 
eral, soundness of backward-planning can be guaranteed by introducing an 
consistency check for each constructed predecessor: For a newly constructed 
state s', s = Res{o, s') must hold. If forward operator application does not 
result in s, the constructed predecessor is considered as not admissible and 
not introduced in the plan. 

Completeness: 



Res ^(o, 

Res~^{o, 

= Res{o,s')\ADD\j{DEL\JPRE} 
with ADD C Res{o, s') 

= {s' \ DEL U ADD) \ ADD U {DEL U PRE} 

with ADD C (s' \ DEL U ADD) 
and with PRE C s' 

= s'\ DEL U {DEL U PRE} 

with PRE C s' if s' n ADD = 0 
= s' U PRE with PRE C s' if DEL C s' 



Res{o,s')) = s' 
Res{o,s')) = 



The restriction s' n ADD = 0 means that only such states can be con- 
structed which do not already contain a literal added by an applicable opera- 
tor. The restriction DEL C s' means that only such states can be constructed 
which contain all literals which are deleted by an applicable operator. While 
this is a real source of incompleteness, we are still looking for a meaningful 
domain where these cases occur. In general, incompleteness can be overcome 
- with a loss of efficiency - by constructing predecessors in the following way: 



Res~^{o, s) = s\ ADD* U {DEL* U PRE] 



for all combinations of subsets ADD* of ADD and DEL* of DEL if ADD C 
s. Thus, all possible states containing subsets of DEL and ADD with the 
special case of inserting all literals from DEL and deleting all literals from 
ADD, would be constructed. Of course, most of these states might not be 
sound! 

For more general operator definitions, as allowed by PDDL, further re- 
strictions arise. For example, it is necessary, that secondary preconditions are 
mutually exclusive. 

Another definition of sound and complete backward operator application 
is given by (Haslum & Geffner, 2000): For s n ADD 0 and s n DEL = 0: 
Res~^{o, s) = s\ADDUPRE. For DEL C PRE, this definition is equivalent 
with our definition. 
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3.3.3 Soundness and Completeness of DPlan 

Definition 3.3.1 (Soundness of DPlan). A universal plan U* is sound, 
if each path II = {{nil, sf), {o\, s\), . . . {on, Sn)} defines a solution. That is, 
for an initial state corresponding to Sn, the sequence of actions o„ o . . . o oi 
transforms Sn into a goal state. 

Soundness of DPlan follows from the soundness of backward operator 
application as defined in section 3.3.2. 

Definition 3.3.2 (Completeness of DPlan). A universal plan II* is com- 
plete, if the plan contains all states s for which an action sequence, trans- 
forming s into a goal state, exists. 

Completeness of DPlan follows from the completeness of backward oper- 
ator application as defined in section 3.3.2. 

If DPlan works on a domain restriction T> given as set of states, only 
such states are included in the plan which also occur in D. From sound- 
ness and completeness of DPlan also follows that for a given set of states 
T>, DPlan terminates with dom = Pro JECT Actions if the state space 

underlying T> is (strongly) connected.® If DPlan terminates with Projec- 
TAcTiONSf’P/an^ C T> then T> \ Pro JECT Actions only contains states 
from which no goal-state in PROJECTAcTiONSf’P/an^ is reachable. One rea- 
son for non-reachability can be that the state space is not connected. That 
is, it contains states on which no operator is applicable. For example, in the 
restricted blocks-world domain where only a puttable but no put operator is 
given (see sect. 3. 1.4.1), no operator is applicable in a state where all blocks 
are lying on the table. As a consequence, a goal on(x, y) is not reachable from 
such a state. A second reason for non-reachability can be that the state space 
is weakly connected (i. e., is a directed graph) such that it contains states 
s, s' where Res{o, s) = s' is defined but not Res{o' , s') = s. For example, in 
the rocket domain (see sect. 3. 1.4. 2), it is not possible to reach a state where 
the rocket is on the earth from a state where the rocket is on the moon. As a 
consequence, a goal at( Rocket, Earth) is nor reachable from any state where 
the rocket is on the moon. 



® For definitions of connectedness of graphs, see for example (Christofides, 1975). 



4. Integrating Function Application 
in State-Based Planning 



Standard planning is defined for logical formulas over variables aird coir- 
stairts. Allowing geireral terms ~ that is, application of arbitrary symbol- 
mairipulatiirg aird irumerical functions ~ enlarges the scope of problems for 
which plairs can be constructed. Iir the context of using plairning as starting 
point for the synthesis of functional programs, we are interested in apply- 
ing planning to standard programming problems, such as list sorting. In the 
following, we will first give a motivation for extending planning to func- 
tion application (sect. 4.1), then an extension of the Strips language pre- 
sented in section 2.1 is introduced (sect. 4.2) together with some extensions 
(sect. 4.3), and finally examples for planning with function application are 
given (sect. 4.4).^ 



4.1 Motivation 

The development of efficient plamring algorithms iir the niireties (see 
sect. 2.4.2 in chap. 2) ~ such as Graphplan, SAT plamring, and heuristic 
plairning (Blum & Furst, 1997; Kautz & Selman, 1996; Bonet & Geffner, 

1999) - has made it possible to apply planning to more demanding real-world 
domains. Examples are the logistics or the elevator domain (Bacchus et ah, 

2000) . Many realistic domains involve manipulation of numerical objects. For 
example: when planning (optimal) routes for delivering objects as in the lo- 
gistics domain, it might be necessary to take into account time and other 
resource constraints as fuel; plan construction for landing a space- vehicle in- 
volves calculations for the correct adjustments of thrusts (Pedirault, 1987). 
One obvious way to deal with irumerical objects is to assign and update their 
values by means of function applications. 

There is some initial work on extending planning formalisms to deal with 
resource constraints (Koehler, 1998; Laborie & Ghallab, 1995). Here, func- 
tion application is restricted to manipulation of specially marked resource 
variables in the operator effect. For example, the amount of fuel might be 
decreased in relation to the distance an airplane flies: $gas -= distance(?x 



^ The chapter is based on the paper Schmid, Muller, and Wysotzki (2000). 

U. Schmid: Inductive Synthesi.s of Functional Programs, LNAI 2654, pp. 71-91, 2003. 

© Springer-Verlag Berlin Heidelberg 2003 
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?y)/3 (Koehler, 1998). A more general approach for including function ap- 
plications into planning was proposed by (Pednault, 1987) within the action 
description language (ADL). In addition to ADD/DEL operator effects, ADL 
allows variable updates by assigning new values which can be calculated by 
arbitrary functions. For example, the put(?x, ?y) operator in a blocks-world 
domain might be extended to updating the state variable ?LastBlockMoved 
to ?LastBlockMoved := ?x; the amount of water in a jug ?j 2 might be changed 
by a pour(lji, lj 2 ) action into ?j 2 .'= ?7im[?C2, where ?ji, ?J 2 are 

the current quantities of water in the jugs and ?C 2 is the capacity of jug ?j 2 
(Pednault, 1994). 

Introducing functions into planning not only makes it possible to deal 
with numerical values in a more general way than allowed for by a purely 
relational language but has several additional advantages (see also Geffner, 
2000) which we will illustrate with the Tower of Hanoi domain (see fig. 4.1)^: 
Allowing not only numerical but also symbol-manipulating functions makes 
it possible to model operators in a more compact and sometimes also more 
natural way. For example, representing which disks are lying on which peg in 
the Tower of Hanoi domain could be realized by a predicate on(? disks, ?peg) 
where 7 disks represents the ordered sequence of disks on peg 7 peg with list 
[oo] representing an empty peg. Instead of modeling a state change by adding 
and deleting literals from the current state, arguments of the predicates can 
be changed by applying standard list-manipulation functions (e. g., built-in 
Lisp functions like car(l) for returning the first element of a list, cdr(l) 
for returning a list without its first element, or cons(x, 1) for inserting a 
symbol x in front of a list V). 

A further advantage is that objects can be referred to in an indirect way. 
In the Hanoi example, car(l) refers to the object which is currently the up- 
permost disk on a peg. There is no need for any additional fluent predicate 
besides on(?disks, ?peg). The clear(?disk) predicate given in the standard 
definition becomes superfluous. Geffner (2000) points out that indirect ref- 
erence reduces substantially the number of possible ground atoms and in 
consequence the number of possible actions, thus plan construction becomes 
more efficient. Indirect object reference additionally allows for modeling infi- 
nite domains while the state representations remain small and compact. For 
example, car (1) gives us the top disk of a peg regardless of how many disks 
are involved in the planning problem. 

Finally, introducing functions in modeling planning domains often makes 
it possible to get rid of static predicates. For example, the smaller(?x, ?y) 
predicate in the Hanoi domain can be eliminated. By representing disks as 
numbers, the built-in predicate “<” can be used to check whether the ap- 
plication constraint for the Hanoi move operator is satisfied in the current 



^ Note that we use prefix notation p{xi, . . . , Xn) for relations and functions 
throughout the paper. 
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(a) Standard representation 
Operator: move(?d, ?from, ?to) 

PRE: {on(?d, ?from), clear(?d), clear(?to), smaller(?d, ?to)} 

ADD: {on(?d, ?to), clear(?from)} 

DEL: {on(?d, ?from), clear(?to)} 



Goal: {on(d3, ps), on(d2, da), on(di, d2)} 

Initial State: {on(da, pi), on(d2, da), on(di, d2), clear(di), clear(p2), clear(pa) 

smaller(di, pi), smaller(di, P2), smaller(di, pa), 
smaller(d2, pi), smaller(d2, P2), smaller(d2, pa), 
smaller(da, pi), smaller(da, P2), smaller(da, pa), 
smaller(di, 6,2), smaller(di, da), smaller(d2, da)} 



(b) Functional representation 
Operator: move(?pi, ?pj) 

PRE: {on(?Zi, ?pi), 7pj), (car(?Zi) 7 ^ 00 ), car(?Zi) < car(?Zj)} 

UPDATE: change 7lj inon{7lj, 7pj) to cons(car(?Zi),?Zj) 

change 7U in on(?Zi, ?pi) to cdr(?Zi) 



Goal: {on([oo], pi), on([oo], P2), on([l 2 3 00], pa)} 

; CX 3 represents a dummy 

Initial State: {on([l 2 3 00], pi), on([oo], P2), on([oo], pa)} ; bottom disk 



Fig. 4.1. Tower of Hanoi (a) Without and (b) With Function Application 



state. Allowing arbitrary boolean operators to express preconditions gener- 
alizes matching of literals to contraint satisfaction. 

The need to extend standard relational specification languages to more 
expressive languages, resulting in planning algorithms applicable to a wider 
range of domains, is recognized in the planning community. The PDDL plan- 
ning language for instance (McDermott, 1998b), the current standard for 
planners, incorporates all features which extend ADL over classical Strips. 
But while there is a clear operational semantics for conditional effects and 
quantification (Koehler et al., 1997), function application is dealt with in a 
largely ad-hoc manner (see also remark in Geffner, 2000). 

The only published approach allowing functions in domain and problem 
specifications is Geffner (2000). He proposed Functional Strips, extending 
standard Strips to support first-order-function symbols. He gives a denota- 
tional state-based semantics for actions and an operational semantics, de- 
scribing action effects as updates of state representations. In contrast to the 
classical relational languages, states are not represented as sets of (ground) 
literals but as state variables (cf. situation calculus). For example, the initial 
positon of disk di is represented as loc(di) = do (see fig. 4.2). 

While in Functional Strips relations are completely replaced by functional 
expressions, our approach allows a combination of relational and functional 
expressions. We extended the Strips planning language to a subset of first 
order logic, where relational symbols are defined over terms where terms can 
be variables, constant symbols, or functions with arbitrary terms as argu- 
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Domains: Peg: pi, P2, P3 ; the pegs 
Disk: di, (I2, d.3 ; the disks 

Disk*: Disk, do ; the disks and a dummy bottom disk 0 

Fluents: top: Peg — > Disk* ; denotes top disk in peg 

loc: Disk — > Disk*; denotes disk below given disk 
size: Disk* ^ Integer ; represents disk size 



Action: move(?pi, ?pj : Peg) ; moves between pegs 

Free: top(?pi) / do, size(top(?pi)) < size(top(?pj)) 

Post: top(?pi):= loc(top(?pi)); loc(top(?pi)):= top(?pj); top(?pj):= top(?pi) 

Init: loc(di) = do, loc(d2) = di, loc(d3) = d2, loc(d 4 ) = do, 

top(pi) = di, top(p2) = do, top(p3) = do, 

size(do) = 4 , size(di) = 3, size(d2) = 2, size(d3) = 1 , size(d4) = 0 
Goal: loc(di) = do, loc(d2) = di, loc(d3) = d2, loc(d4) = ds, top(p3) = d4 



Fig. 4 . 2 . Tower of Hanoi in Functional Strips (Geffner, 1999 ) 



merits. Our state representations are still sets of literals, but we introduce a 
second class of relational symbols (called constraints) which are defined over 
arbitrary functional expressions, as shown in figure 4.1. Standard represen- 
tations containing only literals defined over constant symbols are included as 
special case, and we can combine ADD/DEL effects and updates in operator 
definitions. 



4.2 Extending Strips to Fnnetion Applications 

In the following we introduce an extension of the Strips language (see 
def. 2.1.1) from propositional representations to a larger subset of first order 
logic, allowing general terms as arguments of relational symbols. Operators 
are defined over these more general formulas, resulting in a more complex 
state transition function. 

Definition 4.2.1 (FPlan Language). The language C{X,C,J-,TZ'p UTZe) 
is defined over sets of variables X, constant symbols C, function symbols 
T , and of relational symbols TZ = TZ-pU TZe in the following way: 

— Variables x € X are terms. 

— Constant symbols c G C are terms. 

~ If / G IF is a function symbol with arity a{f) = i and ti, 0, ■ • ■ ti 
are terms, then f{ti,t 2 , . ■ . ,ti) is a term. 

— If p & TZp is a relational symbol with arity a(p) = j and ti, t 2 , ■ ■ • tj are 
terms then p{t\, t 2 , . . . ,tj) is a formula. 

If r S TZe is a relational symbol with arity a{r) = j and ti, O, ■ • ■ 
tj are terms then r{ti,t 2 , . . . ,tj) is a formula. We call formulas with 
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relational symbols from TZq constraints. 

For short, we write p(t) . 

- Ifpi{ti), P 2 {t 2 ), ■ . Pk{tk) with P 1 ,P 2 , ...,Pk&Ti, then {pi{ti), ^ 2 (^ 2 ), 

. . ., Pk{tk)} is a formula, representing the conjunction of literals. 

- There are no other FPlan formulas. 

Remarks: 

- Formulas consisting of a single relational symbol are called (positive) liter- 
als. 

Terms overCUtF, i. e., terms without variables, are called ground terms. 
Formulas over ground terms are called ground formulas. 

- With X{F) we denote the variables occurring in formula F. 

An example of a state representation, goal definition, and operator def- 
inition using the FPlan language is given in figure 4.1.b for the hanoi do- 
main. FPlan is instantiated in the following way: X = {7pi,7pj,7li,7lj}, 
C = {pi,P2,P3,1,2,3,oo}, T = {[ ]\car^ ,cdr^ ,cons'^}, TZv = 

7^c = {^^<n• 

A state in the hanoi domain, such as {on([2 3 00 ], p\), on([oo], P 2 ), on([l 
00 j, P 3 )}, is given as a set of atoms where the arguments can be arbitrary 
ground terms. We can use constructor functions to define complex data struc- 
tures. For example, [2 3 ooj is a list of constant symbols, where [] is a list 
constructor function defined over an arbitrary number of elements. The atom 
on([2 3 00 ], pi) corresponds to the evaluated expression on(cdr([l 2 3 00 ]), 
Pi), where cdr(?l) returns a list without its first element. State transformation 
by updating is defined by such evaluations. 

Relational symbols are divided in to two categories - symbols in TZ-p 
and symbols in TZq. Symbols in TZp denote relations which characterize a 
problem state. For example, on([l 2 3 ooJ, p\) with on G Rp is true in a 
state, where disks 1, 2, and 3 are lying on peg p\. Symbols in TZq denote 
additional characteristics of a state, which can be inferred from a formula 
over TZp. For example, car(Li) yf 00 with yf e TZc is true for Li = [1 2 3 
00 ]. Relations over TZc often are useful for representing constraints, especially 
constraints which must hold for all states, that is static predicates. 

Preconditions are defined as arbitrary formulas over TZp U TZc ■ Standard 
preconditions, such as p = on(7l\, 7x), can be transformed into constraints: 
If match(p,s) results in a\ = {/i <— [1 2 3 00 ], 7x *— p\\, then the constraints 
eq{7x,pi) and eq{7li, [12 3 00 ]), with eq as equality-test for symbols in TZc, 
must hold. 

We allow constraints to be defined over free variables, i. e., variables that 
cannot be instantiated by matching an operator precondition with a cur- 
rent state. For example, variables 7i and 7j in the constraint nth{7i,7l) < 
nth{7j,7l)) might be free. To restrict the possible instantiations of such vari- 
ables, they must be declared together with a range in the problem specifica- 
tion (see sect. 4.4.5 for an example). 
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Definition 4.2.2 (Free Variables in TZc)- For a formula F G C, T, 
TZ-p UTZc)> variables occurring only in literals over TZc o,re called free. The 
set of such variables is denoted as Xc X and with Xc{F) we refer to free 
variables in F. 

For all variables x in Xc instantiations must be restricted by a range R{x). 
For variables belonging to an ordered data type (e. g., natural numbers), a 
range is declared as R{x) = [min, max] with min giving the smallest and 
max the largest value x is allowed to assume. Alternatively, for categorial 
data types, a range is defined by enumerating all possible values the variable 
can assume: R{x) = [t;i, . . . , Vn] • 

Definition 4.2.3 (State Representation). A problem state s is a con- 
junction of atoms over relational symbols in TZp. That is, s G C{C,J-,TZp). 

We presuppose that terms are always evaluated if they are grounded. Evalu- 
ation of terms is defined in the usual way (Field & Harrison, 1988; Ehrig & 
Mahr, 1985): 

Definition 4.2.4 (Evaluation of Ground Terms). A term t over £{C , !F) 
is evaluated in the following way: 

— eval(c) = c, for c G C 

— eval(f{ti, . . . ,tn)) = apply (f(eval(ti), ..., eval(tn))), for f G T and 
ti, . . . ,tn as terms over C U J-. 

Function application apply returns a unique value for /(ci, . . . ,Cn). 

For constructor functions (such as the list constructor [ ]) we define 
apply(fc{ci, . . .,Cn)) = /c(ci, . . . ,c„). 

For example, eval(cdr([l 2 3 oo])) returns [2 3 ooj. Evaluation of eval(plus(3, 
minus(5, 1))) returns 7. Evaluated states are sets of relational symbols over 
constant symbols and complex structures, represented by constructor func- 
tions over constant symbols. For example, the relational symbol on in state 
description {on([2 3 oo], pi), on([oo], P 2 ), on([l 00 ], p^)} has constant ar- 
guments pi, p 2 , and P 3 , and complex arguments [2 3 00 ], [oo], and [1 00 J. 
Relational symbols in TZc can be considered as special terms, namely terms 
that evaluate to a truth value, also called boolean terms. In contrast, rela- 
tions in TZp are true if they match with the current state and false otherwise. 
In the following, we will speak of evaluation of expressions when referring to 
terms including such boolean terms. 

Definition 4.2.5 (Evaluation of Expressions over Ground Terms). 

An expression over C{C , T , TZc) is evaluated in the following way: 

— eval{c), eval{f{ti, . . . ,tn)) as in def. 4.2.j. 

— eval(r{ti, . . . ,tn)) = apply (r(eval(ti), ..., eval(tn))), for r G TZc o.nd 
ti,...,tn as terms over C U J- U TZc with eval(r(ti, . . . ,tn)) G {TRUE, 
FALSE}. 
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For a set of atoms over C{C,T,TZc), i- e., a conjunction of boolean expres- 
sions, evaluation is extended to: 



For example, {car([l 2 3 ooj) ^ oo, car([l 2 3 ooj) < car([oo])} evaluates 
to TRUE. In our planner, evaluation of expressions is realized by calling the 
meta-function eval provided by Common Lisp. 

Before introducing goals and operators defined over C{C,iF, TV), we extend 
the definitions for substitution and matching (def. 2.1.3): 

Definition 4.2.6 (Product of Substitutions). The composition of sub- 
stitutions crocr' is the set of all compatible pairs (x t) € a and {x' <— t') € o' 
with x,x' & X and t,t' G C{X ,C,iF). Substitutions are compatible iff for each 
{x ^ t) G a there does not exist any {x' <— F) G cr' with x = x' and t ^ t' . 
The product of sets of substitutions S = {oi, . . . cr„} and S' = {cr(, . . . o',^} is 
defined as pairwise composition S U S' = Oi o cr' for i = 1 . . .n, j = 1 . . .m. 

Definition 4.2.7 (Matching and Assignment). Let F = Fp U Fq be a 

formula with 

— Fp = {pi(ii), . . . ,Pi{U)} G L{X,C,S,Tlp) and 

— Fc = {rfft'^), ■ . ■ ,c(t'.)} G C{X ,C,S,TLc), and let 

— A G L{C,T,TZp) be a set of evaluated atoms. 

The set of substitutions S for F with respect to A is defined as SpUSc with 
Fp^_ C A and eval(Fcc.) = TRUE for all Oi G S. 

— Sp is calculated by match(Fp, A) as specified in definition 2.1.3. 

— Sc is calculated by ass(Ac) for all free variables X\, . . . ,Xn = Xc{Fc) such 

that for all Oi G Sq holds Oi = {x\ ^ ci, . . . , <— c„} where Ci G R{xi) 

(constants Ci in the range of variable Xi, see def. j.2.2). 

Definition 4.2.7 extends matching of formulas with sets of atoms in such a 
way that only those matches are allowed where substitutions additionally 
fulfill the constraints. 

Consider the state 

A = {on([2 3 cx3],pi),on([oo],p2),on([l cxsj.ps)}. 

Formula PRE of the move operator given in figure 4.1.b 




TRUE if ri(ti) = TRUE and 



. . . and rfftn) = TRUE 
FALSE else. 



F = {on{7li,7pi),on{?lj,7pj),car{7li) A oo,car{7U) < car{7lj)} 



can be instantiated to U = {ui, <72, 173} with 



71 = {7pi ^ pi, 7pj ^ P 2 , 7k ^[2 3 00], 7lj [00]}, 

72 = { 7 pi ^ P3, 7 pj ^ p2, 7 li ^ [1 00], 7 lj ^ [00]}, 

73 = {7pi ^ P3, 7pj ^ pi, 7k ^ [1 00], 7lj ^[2 3 00]} 



[2 3 00 ], 7lj ^ 
[1 00 ], 7lj ^ [c 



- [ 00 ]}, 
^[2 3 00 ]} 



78 



4. Integrating Function Application in Planning 



but not to 

(74 = {?Pi ^ Pi, <— P3, ^[2 3 oo], llj ^ [1 oo]}, 

trs = {?Pi ^P 2 ,?Pj ^pi,7li ^ [oo],7lj ^[2 3 oo]}, 

(76 = {?Pi ^P 2 ,?Pj ^P3,^h ^ M,?Zj ^ [1 oo]}. 

A procedural realization for calculating all instantiations of a formula 
with respect to a set of atoms is to first calculate all matches and then 
modify S by stepwise introducing the constraints. For the example above, 
we first calculate E = {ci, ct 2 , 0 - 3 , CT 4 , 0 - 5 , ere} considering only the sub- formula 
Fp = {on{?pi, ?li),on(?pj, ?^j)}. Next, constraint car{lli) 7^ 00 is introduced, 
reducing E to E = {<ti, CT 2 , 0-3, 174}. Finally, we obtain E = {cri, (T 2 , ds} by 
applying the constraint car{lli) < car{llj). 

If the constraints contain free variables, each substitution in E is com- 
bined with instantiations of these variables as specified in definition 4.2.6. 
An example for instantiating formulas with free variables is given in section 
4.4.5. 

Definition 4.2.8 (Goal Representation). A goal Q is a formula in 

C, F, n). 

The goal given for Hanoi in figure 4.1.b can alternatively be represented by: 

{on([l 2 3 oo],p3),on{?li,?Pi),on{?lj,?Pj),oo = car{?li),oo = car{?lj)}. 

Definition 4.2.9 (FPlan Operator). An FPlan operator op is described 
by preconditions (PRE), ADD and DEL lists, and updates (UP) with 
P RE = PREp U PREc € PlyX ,C , T ,'Rp U R-c) and ADD, DEL € 
C{X,C,F,TZv). 

An update UP is a list of function applications, specifying that a function 
f{t) € C{X ,C,F) is applied to a term t' which is argument of literal p(t') 
with p G TZp . 

Updates overwrite the value of an argument of a literal. Realizing an 
update using ADD/DEL effects would mean to first calculate the new value 
of an argument, then deleting the literals in which this argument occurs and 
finally adding the literals with the new argument values. That is, modeling 
updates as specific process is more efficient. An example for an update effect 
is given in figure 4.1.b: 

change lU in on{lli,lpi) to cdr{?li). 

defines that variable lU in literal on{7li,7pi) is updated to cdr(7li). If more 
than one update effect is given in an operator, the updates are performed 
sequentially from the uppermost to the last update. An update effect cannot 
contain free variables, that is XfUP) C X{PRE). 

Definition 4.2.10 (Updating a State). Por a state s = {pi(ti),..., 

Pn{tn)} o,nd an instantiated operator o with update effects u G UP, updating 
is defined as replacement := eval{f{t)) for a fixed argument t' in a fixed 
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literal pj{t') G s with X{t), X{t') C X{PRE). 

For short we write update{u,s). 

Applying a list of updates UP = {u\, . . . ,Un) to a state s is defined as 
update{UP,s) = update{un,update{un-i, ■ ■ ■ ,update{u\, s))) . 

For the initial state of hanoi in figure 4.1, s = {on(pi, [12 3 ooj), on(p 2 , 
[oo]), on(ps, [oo])}, the update effects of the move operator can be instanti- 
ated to 

ui — change [oo] in on(p 3 , [oo]) to cons(car([l 2 3 oo]^ [oo]j 
U 2 — change [l 2 3 oo/ in on{pi, [12 3 oo]) to cdr([l 2 3 oo]). 

Applying these updates to s results in: 

update{{ui,U 2 ),{on{pi,[l 2 3 oo]), on(p 2 , [oo]), on(p 3 , [oo])}) = 
update{u 2 ,{on{pi, [12 3 oo]),on(p 2 , [oo]), 
on{p 3 , eval{cons{car{[l 2 3 oo]), [oo])))} = 
update{u 2 , {on{pi,[l 2 3 oo]), on(p 2 , [oo]), on(p 3 , [1 oo]))} = 
{on{pi,eval{cdr{[l 2 3 oo]))), on(p 2 , [oo]), on(p 3 , [l oo])} = 

{on(pi,[2 3 oo]),on(p 2 , [oo]),on(p 3 , [1 oo])}. 

Note that the operator is fully instantiated before the update effects are 
calculated. The current substitution a G S is not affected by updating. That 
is, if the value of an instantiated variable is changed, as for example L 2 = [ 00 ] 
is changed to L 2 = [1 00 ], this change remains local to the literal specified in 
the update. 

Definition 4.2.11 ((Forward) Operator Application). For a state s 
and an instantiated operator o, operator application is defined as Res(p, s) = 
updatefU P, s \ DEL{o) U ADD{o)) if PREp{o) C s and eval{PREc{o)) = 
TRUE. 

ADD/DEL effects are always calculated before updating. Examples for 
operators with combined ADD/DEL and update effects are given in section 

4.4. 

An FPlan planning problem is given as P{0,T,G), where operators O, 
initial states I, and goals Q are defined over the FPlan language, which 
extends Strips by allowing function applications. A plan for the hanoi problem 
specified in figure 4.1.b is given in figure 4.3. 



4.3 Extensions of FPlan 

4.3.1 Backward Operator Application 

As described in section 2.1 in chapter 2, plan construction can be performed 
using either forward or backward search in the state space. To make FPlan 
applicable for backward planners, we introduce backward operator applica- 
tion. 

Backward operator application for Strips (see def. 2.1.7) was defined so 
that an operator is backward applicable in a state s if its ADD list can be 
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((ON [1 2 ] PI) (ON [oo] P2) (ON [ o^ P3)) I 


(MOVI 


PI P3) 


1 ((ON [2 3oc] PI) (ON [oo] P2) (ON [1 o^ P3)) I 


(MOVI 


PI P2) 


((ON [3oo] PI) (ON [2 oo] P2) (ON [1 o^ P3)) I 


(MOVI 


P3 P2) 


((ON [3oe] PI) (ON [1 2»] P2) (ON [0^ P3)) I 


(MOVI 


PI P3) 


1 ((ON [50] PI) (ON [1 2 oe] P2) (ON [3 P3)) I 


(MOVI 


P2P1) 


((ON [loo] PI) (ON [2 oo] P2) (ON [3 o^ P3)) I 


(MOVI 


P2P3) 


1 ((ON [loo] PI) (ON [oo] P2) (ON [2 3 0 ^ P3)) I 


(MOVI 

((ON[oo]Pl)(ON[oo] 


PI P3) 

P2)(ON[12 3 o^P3)) 



Fig. 4.3. A Plan for Tower of Hanoi 



matched with the current state, resulting in a predecessor state s' where 
all elements of the ADD list are removed and which contains the literals of 
PRE and DEL. For operators containing update effects and constraints, the 
definition of backward operator application has to be extended with respect 
to applicability conditions and calculation of effects. 

An operator is backward applicable if its ADD list can be matched with 
the current state s and if those constraints that hold after forward operator 
application hold in s. We call such constraints the postcondition (POST) of an 
operator. In many domains, the constraints holding in a state after forward 
operator application are the inverse of the constraints specified in the operator 
precondition. For example, for the hanoi domain, the constraints for forward 
application are {car(lli) ^ oo, car (Hi) < car (11 j)} where Hi represents the 
disks lying on peg Ipi and Hj represents the disks lying on peg Ipj (see 
fig. 4.1.b). After executing move(lpi, Ipj), the “top” disk of Ik is inserted 
as head of list Hj, and the constraints {car(Hj) ^ oo, car(Hj) < car(lk)} 
hold. The first constraint must hold, because a disk is moved to peg Ipj and 
therefore Hj cannot be the empty peg represented as list [oo]. The second 
constraint must hold because all legal states in the hanoi domain contain 
stacks of disks with increasing size. Because x < y holds for list Ik = [x y ■■], 
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and because x is the new head element of 11 j after executing move, the new 
head element of list Ik (y) is larger than the new head element of list 11 j (x). 

To calculate a backward update effect, the inverse function f~^ to up- 
date function / must be known. In general, updating can involve a sequence 
of function applications. That is, / = /i o . . . o To make backward plan 
construction compatible with forward plan execution, update effects must be 
restricted to bijective functions: f{x) = y For the move oper- 

ator in the hanoi domain, the inverse function of removing the top element 
of a tower i. e. the head element of the list Ik and inserting it at as the head 
of 11 j is to remove the first element of 11 j and inserting it at as head of Ik- 

We do not propose an algorithm for automatic construction of correct 
constraints for backward application and inverse functions. Instead, we re- 
quire that for each operator op its inverse operator op~^ has to be defined in 
the domain specification. It is the responsibility of the author of an domain 
to guarantee that op~^ is the inverse operator of op; that is, that Res(o,s’) 
= s holds for Res~^ (o~^ , s) = s’ for all operators. 

When defining an inverse operator, the (inverse) precondition represents 
the conditions which must hold after forward application of the original oper- 
ator - i. e. the inverse precondition corresponds to the original postcondition; 
the updates represent the inverse function as discussed above. The inverse 
operator move~^ for hanoi is: 

Operator: move“^(?pi, Ipj) 

PRE: {on(?/i, Ipi), on{llj, Ipj), ca,r{llj) ^ oo, c‘Av{lk) > car(?Zj)} 

UPDATE: change Ik in oii(lk, Ipj) to cons(car(?Zj), Ik) 

change llj in on(llj, Ipi) to cdr(llj) 



Because update effects are explicitly defined for inverse operator application, 
updates are performed as specified in definition 4.2.10. 

For state s = {on([2 3 oo],pi), on([oo],p 2 ), on([l oo],p 3 )} backward applica- 
tion of the instantiated operator move~^ (p\, ps) results in s = {on([l 2 3 oo], 
pi), on{[oo],p 2 ),on{[oo],p 3 )}. 

For operators defined with mixed ADD/DEL and update effects the in- 
verse operator is also defined explicitely, with inverted precondition, ADD 
and DEL list. Application of an inverted operator can then be performed 
as specified in definition 4.2.11 - that is, backward operator application is 
replaced by forward application of the inverted operator. Examples for back- 
ward application of operators with combined ADD/DEL and update effects 
are given in section 4.4. 

4.3.2 Introducing User-Defined Functions 

As defined in section 4.2, the set of function symbols T and the set of rela- 
tional symbols 'Rc of FPlan can contain arbitrary built-in and user-defined 
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functions. When parsing a problem specification, each symbol which is part of 
Common Lisp and each symbol corresponding to the name of a user-defined 
function is treated as expression to be evaluated. Symbols in TZq have to 
evaluate to a truth value. 

An example specification of hanoi using built-in and user-defined functions 
is given in figure 4.4.^ For the Towers of Hanoi domain the FPlan language 
is instantiated to X = {7pi,7pj,7li,7lj}, C = {pi, p 2 , Ps, 1, 2, 3, ni!\, 
T = {[Y,car^,cdr^,cons'^}, TZ-p = {on^}, TZc = {not— empty^ , legaP}. The 
empty list [ J corresponds to nil. The user-defined functions can be defined 
over additional built-in and user-defined functions which are not included in 
C. For a Lisp-implemented planner - like our system DPlan - after reading 
in a problem specification, all declared user-defined functions are appended 
to the set of built-in Lisp functions and interpreted in the usual manner. 



Operator: 

PRE: 

UPDATE: 



move(?pi, 7pj) 

{on (7h, 7pi), on{7lj, 7pj), not-empty(?Zi), legal(?Zi, 7lj)} 
change 7lj in on{7lj, 7pj) to cons(car(?/i), 7lj) 
change 7h in on{7li, 7pi) to cdr(?Zi) 



Goal: (on([ ], pi), on([ ], P 2 ), on([l 2 3], ps)} 

Initial State: {on([l 2 3], pi), on([ ], P 2 ), on([ ], pa)} 

Functions: not-empty(Z) = not(null(Z)) 

legal(Zi, Z 2 ) = i/null(Z 2 ) then TRUE else < (car(Zi), car(Z 2 )) 



Fig. 4.4. Tower of Hanoi with User-Defined Functions 



4.4 Examples 

In the previous sections, we presented FPlan with a functional version of 
the hanoi domain, demonstrating how operators with ADD/DEL effects can 
be alternatively modeled with update effects. In the following, we will give a 
variety of examples for problem specifications with FPLan. First we will show 
how problems involving resource constraints can be modeled. Afterwards, we 
will give examples for a numerical domain and domain specifications involving 
operators with mixed ADD/DEL effects and updates. We will show how 
standard programming problems can be modeled in planning, and finally we 
will present preliminary ideas for combining planning with constraint logic 
programming. 

For selected problems we include empirical comparisions, demonstrating 
that plan construction with function updates is significantly more efficient 

^ For our planning system, functions are defined in Lisp syntax, for example: 
(defun legal (11 12) (if (null 12) T (< (car 11) (car 12)))). 
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than plan construction using ADD/DEL effects. We tested all examples with 
our backward planner DPlan (Schmid & Wysotzki, 2000b). Therefore, we 
present standard forward operator definitions together with the inverse op- 
erators as specified in section 4.3.1. 

4.4.1 Planning with Resource Variables 

Planning with resource constraints can be modeled as a special case of func- 
tion updates with FPlan. As an example we use an airplane domain (Koehler, 
1998). A problem specification in FPlan is given in figure 4.5. 



Operator: 

PRE: 

ADD: 

DEL: 

UPDATE: 



S.y{?plane, ?x, ?y) 

{at{?plane, ?x), fuel-resource(?piane, 1 fuel), 
7 fuel > distance)?®, 7y)/3} 

{a,t(?plane, 7y)} 

{at{?plane, ?®)} 

change 7 fuel in fuel-resource(?piane, 7 fuel) 
to calc”fuel(?/wei, 7x, 7y) 
change 7time in time-resource(?piane, 7time) 
to calc-time(?time, 7x, 7y) 



Goal: {at(pl, berlin)} 

Initial State: {at(pl, london), fuel-resource (pi, 750), time-resource(pl, 0)} 

Functions: calc-fuel(?/wei, 7x, 7y) = ?/riei — distance)?®, ?y)/3 

calc-time(?time, ?®, 7y) = 7time + 3/20 * distance)?®, ?j/) 



Fig. 4.5. A Problem Specification for the Airplane Domain 



The operator fly specifies what happens when an airplane 7 plane flies 
from airport ?x to airport 7y. We consider two resources: the amount of fuel 
and the amount of time the plane needs to fly from one airport to another. In 
the initial state the tank is filled to capacity {fuel-resource(750)) and no time 
has yet been spent (time-resource(O)). A resource constraint that must hold 
before the action of flying the plane from one airport 7x to another airport 
?y can be carried out would be: at(7plane, lx), (?fuel > distance(lx, ly)/3). 
That is the plane has to be at airport lx and there has to be enough fuel in 
the tank to travel the distance from lx to ly. 

The distance between two airports is obtained by calling the function dis- 
tance{lx, ly) which returns the distance value by consulting an underlying 
database (see table 4.1). The distances can alternatively be modeled as static 
predicates (for example distance (berlin, paris, 5f0), etc.). Using a database 
query function is more efficient than modeling the distances with static pred- 
icates because static predicates have to be encoded in the current state and 
finding a certain distance (instantiating the literal) requires matching which 
takes considerably longer than a database query. 
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Table 4.1. Database with Distances between Airports 



Berlin Paris London New York 



Berlin - 

Paris 540 m 

London 570 m 

New York 3960 m 



540 m 

210 m 
3620 m 



570 m 
210 m 

3460 m 



3960 m 
3620 m 
3460 m 



The position of the plane Iplane is modeled with the literal at. When 
flying from lx to ?y the literal at(l plane, lx) is deleted from and the literal 
at(lplane, ly) is added to the set of literals describing the current state. The 
resource variable 1 fuel described by the relational symbol fuel-resource is 
updated with the result of the user-defined function calc-fuel which calculates 
the consumption of fuel according to the traveled distance. The resource 
variable Itime described by the relational symbol time-resource is updated 
in a similar way asuming that it takes 3/20 * distance(lx,ly) to fly the 
distance from airport lx to ly. 

When ADD/DEL and update effects occur together in one operator 
the update effect is carried out on the state obtained after calculating the 
ADD/DEL effect (definition 4.2.11). For backward planning we have to spec- 
ify a corresponding inverse operator fiy~^ (see fig. 4.6). The precondition for 
fly~^ requests that the plane be at airport ly and you have more time left 
than it takes to fly the distance between airport lx and airport ly. The re- 
source variables ?fuel and ?time are updated with the result of the inverse 
functions {calc-fuel~^ and calc-time~^). 



Operator: 

PRE: 



ADD: 

DEL: 

UPDATE: 



Functions.: 



S.y~^(?plane, lx, ly) 

{at(?plane, ly), time-resource(?pZane, Itime), 
fuel-resource(?pZane, Ifuel), 

Itime > distance(?a:, ?j/))*3/20) } 

{at(lplane, lx)} 

{at{lplane, ly)} 

change Ifuel in fuel-resource(?plane, Ifuel) 
to calc-iuel~^ {1 fuel, lx, ly) 
change Itime in time-resource(?plane, Itime) 
to calc-time“^ (?time, lx, ly) 

ca\c-iuel~^ {1 fuel, lx, ly) = ?/ueZ -I- distance(?a:, ?y)/3 | 750 

; 750 is the maxium capacity of the tank 

calc-time“^ (?time, lx, ly) = Itime — 3/20 * distance)?®, ly) 



Fig. 4.6. Specification of the Inverse Operator fly ^ for the Airplane Domain 



Updating of state variables as proposed in FPlan is more flexible than han- 
dling resource variables separately (as in Koehler (1998)). While in Koehler 
(1998), fuel and time are global variables, in FPlan the current value of fuel 
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and time are arguments of relational symbols fuel-resource(?plane, ffuel), 
time-resource(?plane,?fuel). Therefore, while modeling a problem involving 
more than one plane can be easily done in FPlan this is not possible with 
Koehler’s approach. For example we can specify the following goal and initial 
state: 

Goal: {at(pl, berlin), at(p2, paris)} 

Initial State: {at(pl, berlin), fuel-resource(pl, 750), time-resource(pl, 0), 
at(p2, paris), fuel-resource(p2, 750), time“resource(p2, 0)} 



Modeling domains with time or cost resources is simple when function 
applications are allowed. Typical examples are job scheduling problems - 
for example the machine shop domain presented in (Veloso et ah, 1995). To 
model time steps, a relational symbol time(?t) can be introduced. Time ?t is 
initially zero and each operator application results in It being incremented by 
one step. To model a machine that is occupied during a certain time interval, 
a relational symbol occupied(?m, ?o) can be used where ?m represents the 
name of a machine and ?o the last time slot where it is occupied with ?o = 0 
representing that the machine is free to be used. For each operator involving 
the usage of a machine, a precondition requesting that the machine is free for 
the current time slot can be introduced. If a machine is free to be used, the 
occupied relation is updated by adding the current time and the amount of 
time steps the executed action requests. It can also be modeled that occupa- 
tion time does not only depend on the kind of action performed but also on 
the kind of object involved (e. g., polishing a large object could need three 
time steps, while polishing a small object needs only one time step). 

4.4.2 Planning for Numerical Problems — The Water Jug Domain 

Numerical domains as the water jug domain presented by Pednault (1994), 
cannot be modeled with a Strips-like representation. Figure 4.7 shows an 
FPlan specification of the water jug domain: We have three jugs of different 
volumes and different capacities. The operator pour models the action of 
pouring water from one jug ?jl into another jug ?j2 until either ?jl is empty 
or ?j2 is filled to capacity. The capacities of the jugs are statics while the 
actual volumes are fluents. The resulting volume ?vl for jug ?jl is either 
zero or what remains in the jug when pouring as much water as possible into 
jug ?j2: max[0,wl — c2 -|- v2]. The resulting volume ?v2 for jug ?j2 is the 
either its capacity or its previos volume plus the volume of the first jug ?jl: 
min[c2, z;l -|- v2\. 

After pouring water from one jug into another the postcondition (lv2 = 
lc2) or (Ivl = 0) must hold. That is the volume ?vl of the first jug ?jl 
must be zero or the volume ?v2 of the second jug ?j2 must equal its capacity 
?c2. When specifying the inverse operator pour~^ for backward planning this 
postcondition becomes the precondition of pour~^ . The update functions are 
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Operator: 

PRE: 

UPDATE: 



Operator: 

PRE: 

UPDATE: 



pour(?jl, 7j2) 

{volume(?jl, 7vl), volume(?j2, ?u2), capacity(?j2, ?c2)) 
(?u2 < ?c2), {?vl > 0)} 

change 7vl in volume(?jl, ?ul) to max{0,?vl—?c2+7v2) 
change 7v2 in volume(?j2, ?u2) to min{?c2, ?vl+?v2) 

pour“^(?jl, 7j2) 

{volume(?jl, ?i>l), volume(?j2, ?i>2), capacity(?j2, ?c2)) 
((?w2 = ?c2) or (?nl = 0))} 

change 7vl in volume(?jl, ?ul) to max{0,7vl—?c2+?v2) 
change 7v2 in volume(?j2, 7v2) to min{7c2, 7vl+7v2) 



Statics: {capacity(a, 36), capacity(b, 45), capacity(c, 54)} 

Goal: {volume(a, 25), volume(b, 0), volume(c, 52)} 

Initial State: {volume(a, 16), volume(b, 7), volume(c, 34) } 



Fig. 4.7. A Problem Specification for the Water Jug Domain 



inverted: The volume ?vl of the first jug ?jl is updated with the result of 
min[7cl, ?z;l+?t;2] and the volume ?v2 of the second jug ?v2 is updated with 
the result of maa:[0, ?t;2— ?cl+?z;l]. The resulting plan for the given problem 
specification is shown in figure 4.8. 



((VOLUME A 25) (VOLUME B 0) (VOLUME C 52)) I 


(POUR 


■1 B A) 


((VOLUME A 0) (VOLUME B 25) (VOLUME C 52)) I 


(POUR 


■1 AC) 


1 ((VOLUME A 36) (VOLUME B 25) (VOLUME C 16)) I 


(POUR 


•IB A) 


1 ((VOLUME A 16) (VOLUME B 45) (VOLUME C 16)) I 


(POUR 


•1 CB) 


((VOLUME A 16) (VOLUME B 7) (VOLUME C 54)) I 


(POUR 

((VOLUME A 36) (VOLU 


■1 CA) 

VIE B 7) (VOLUME C 34)) 



Fig. 4.8. A Plan for the Water Jug Problem 



For this numerical problem there exists no backward plan for all initial 
states which can be transformed into the goal by forward operator appli- 
cation. For instance, for the initial state volume(a, 16), volume(b, 27), vol- 
ume (c, 34 ) the following action sequence results in a goal state: 
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volume (a, 
volume (a, 
volume (a, 
volume (a, 
volume(a, 



16), volume(b, 27), volume(c, 3)) 
16), volume(b, 45), volume(c, 16) 
36), volume(b, 25), volume(c, 16) 
0), volume(b, 25), volume(c, 52) 
25), volume(b, 0), volume(c, 52) 



pour(c, b) 
pour(b, a) 
pour (a, c) 
pour(b, a) 



but the initial state is not a legal predecessor of volume(a, 16), volume(b, 
45), volumefc, 16) when applying the pour~^ operator. That is, backward 
planning is incomplete! The reason for this incompleteness is that the initial 
state in forward planning can be any arbitrary amount of water in the jugs, 
while for backward planning is has to be a state obtainable by operator 
application. A remedy for this problem is to introduce a second operator 
fill-jugs(?jl, ?j2, ?j3) which has no precondition. This operator models the 
initial filling up of jugs from a water source. 

4.4.3 Functional Planning for Standard Problems ~ Tower of 
Hanoi 

A main motivation for modeling operator effects with updates of state vari- 
ables by function application given by Geffner (2000) is that plan construction 
can be performed more efficiently. We give an empirical demonstration of this 
claim by comparing the running times when planning for the standard spec- 
ification with those of the the functional specification of Tower of Hanoi (see 
fig. 4.1). The inverse operator move~^ was specified in section 4.3.1. 

The running times were obtained with our system DPlan on an Ultra 
Sparc 5 system. DPlan is a universal planner implemented in Lisp. Because 
DPlan is based on a breadth-first strategy and because we did not focus on 
efficiency - for example we did not use OBDD representations (Cimatti et al. , 
1998) - it has longer running times than state of the art planers. Since we 
are interested in comparing the performance when planning for the different 
specifications we do not give the absolute times but the performance gain in 
percent, calculated as [sf — ft* 100] /st, where st is the running time for the 
standard specification and ft the running time for the functional specification 
(see table 4.2).^ 

Because in the standard representation the relations between the sizes 
of the different discs must be specified explicitely by static relations, the 
number of relational symbols is quite large (18 literals for three disks, 25 for 
four disks, etc.). In the functional representation, built-in functions can be 
called to determine wether a disk of the size 2 is smaller than a disk of the size 
3. The number of literals is very small (3 literals for an arbitrary number of 
disks). Consequently planning for the standard representation requires more 

^ The absolute values are (standard/functional): 3 discs: 34.5 sec/1.26 sec; 4 
discs: 335.89 sec/4.47 sec; 5 discs: 10015.59 sec/19.3 sec; 6 discs (729 states): 
>15 h/62.28 sec. 
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matching than planning for the functional one, where the constraint of the 
precondition has to be solved instead. 



Table 4.2. Performance of FPlan: Tower of Hanoi 



Tower of Hanoi 



Number of disks 3 4 

States 27 81 

Performance gain 96.3% 98.7% 



5 

243 

99.8% 



4.4.4 Mixing ADD/DEL Effects and Updates Extended 
Blocksworld 

In this section we present an extended version of the standard blocks- world 
(see fig. 4.9). First we introduce an operator clearblock which clears a block 
?x by putting the block lying on top of ?x on the table - if block topof(?x) 
is clear. That is, a block is addressed in an indirect manner as result of a 
function evaluation (Geffner, 2000). As terms can be nested this corresponds 
to an infinite type. For a state where on(A, B) holds, the function topof(B) 
returns A, while for a state where on(C, B) holds, topof(B) returns C. The 
standard blocks-world operators put and puttable are also formulated using 
the topof function. Furthermore all operators are extended by an update 
for the value of the LastBlockMoved (Pednault, 1994). Note that we used a 
conditional effect for specifying the put operator. Our current implementation 
allows conditioned effects for ADD/DEL effects but not for updates. 

The update effect changing the value of LastBlockMoved cannot be han- 
deled in backward planning because we cannot define inverse operators in- 
cluding an inverse update of the argument LastBlockMoved. The inverse func- 
tion would have to return the block that will be moved in the next step. At 
the current time step we do not know which block this will be. 

Other extended blocks-world domains can be modeled with FPlan. For 
example, inluding a robot agent who stacks and unstacks blocks with an 
associated energy level which decreases with each action it performes The 
robot’s energy can be described with the relational symbol energy (?robot, 
? energy) and is consumed according to the weight of the actual block being 
carried by updating the variable fenergy in energy (?robot, fenergy). 

4.4.5 Planning for Programming Problems Sorting of Lists 

With the extension to functional representation a number of programming 
problems for example list sorting algorithms such as bubble, merge or selec- 
tion sort can be planned more efficiently. We have specified selection sort in 
the standard and the functional way (see fig. 4.10). 
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(a) Operator: clearblock(?a:) 

PRE: {?y = topof(?a;), clear(?j/)} 

ADD: {clear(?3;)} 

DEL: {on(?i/,?x)} 

UPDATE: change Iblock in LastBlockMoved(?6Zocfc) to ly 

(b) Operator: puttable(?a;) 

PRE: {7x = topof(?i/), clear(?a;)} 

ADD: {clear(?a:)} 

DEL: {on(?3;,?j/)} 

UPDATE: change Iblock in LastBlockMoved(?Wocfc) to lx 



(c) Operator: put(?a;, ly) 

PRE: {clear(?3;), clear(?y) } 

ADD: {on(?t/,?a:)} 

DEL: {clear(?x)} 

WHEN on(?a; Iz) ADD {on(?a: ?y), clear(? 2 )} 

DEL {clear(?j/), on(?3; Iz)} 

UPDATE: change Iblock in LastBlockMoved(?Wocfc) to ly 



Functions: topof(?a:) = if on{ly,lx) then ly else nil 



Fig. 4.9. Blocks- World Operators with Indirect Reference and Update 



(a) Standard representation 
Operator: swap(?i, Ij, lx) 

PRE: {is-at(?a;, li), is-at(?i/, Ij), greater(?a;, ly) 

ADD: {is-at(?a;, Ij), is-at(?y, li)} 

DEL: {is-at(?a;, li), is-at(?i/, Ij)} 



Initial State: {is-at(3, pi), is-at(2, p2), is-at(l, p3)} 

Goal: {is-at(l, pi), is-at(2, p2), is-at(3, p3)} 

(b) Functional representation 
Operator: swap(?f, Ij, lx) 

PRE: {slist(?3;), nth(?i, lx) > nth(?j, lx)} 

UPDATE: change lx in slist(?a;) to swap(?f, Ij, lx) 



Variables: 
Initial State: 
Goal: 
Functions: 



{{li :range (0 2)) {Ij :range (0 2))} 

{slist([ 3 2 1])} 

{slist([ 12 3])} 

swap {li, Ij, lx) = (let ((temp (nth j x))) 
(setf (nth j x) (nth i x) 
(nth i x) temp) 

x)) 



Fig. 4.10. Specification of Selection Sort 
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In the functional representation we can use the built-in function “>” 
instead of the predicate greater, the constructor for lists slist, and the function 
nth(n, L) to reference the nth element of the list L instead of specifying the 
position of each element in the list by using the literal is-at(?x, ?i). In the 
functional representation the indices to the list ?i and ?j are free variables, 
which - for lists with three elements - can range from zero to two. When 
calculating the possible instantiations for the operator swap the variables ?i, 
?j can be assigned any value within their range for which the precondition 
holds (definition 4.2.2). The function swap is defined as an external function. 
The inverse function to swap is the function swap itself. 



Table 4.3. Performance of FPlan: Selection Sort 

selection sort 

Number of list elements 3 4 5 

States 6 24 120 

Performance gain 27.3% 81.3% 96.1% 



The performance gain when planning for the functional representation is 
large (table 4.3) but not as stunning as in the Hanoi domain.® This may be 
due to the free variables 7i and ?j that have to be assigned a value according 
to their range. Consequently the constraint solving takes some longer but is 
still faster than the matching of the literals for the standard version (6 for 
three list elements, 10 for four, etc. versus only one literal for the functional 
representation) . 

4.4.6 Constraint Satisfaction as Special Case of Planning 

Logical programs can be defined over rules and facts (Sterling & Shapiro, 
1986). Rules correspond to planning operators. Facts are similar to static 
relations in planning, that is, they are assumed to be true in all situations. 
Constraint logic programming extends standard logic programming so that 
constraints can be used to efficiently restrict variable bindings (Friihwirth & 
Abdennadher, 1997). In figure 4.11 an example for a program in constraint 
Prolog is given. The problem is to compose a light meal consisting of an 
appetizer, a main dish and a dessert which all together contains not more 
than a given calorie value (see Friihwirth & Abdennadher, 1997). 

The same problem can be specified in FPlan (figure 4.12). The goal con- 
sisting of a conjunction of literals now contains free variables and a set of 
constraints determining the values of those variables. In this case ?x, ly, Iz 
are free variables that can be assigned any value within their range (here 

® The absolute values are (standard/functional): 3 elem.: 0.33 sec/0.24 sec; 4 
elem.: 8.93 sec/1.67 sec; 5 elem.: 547.05 sec/21.57 sec; 6 elem. (720 states): 
>15 h/717.41 sec. 
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lightmeal(A, M, D) ^ 1400 > I + J + K, I > 0, J > 0, K > 0, 

appetizer(A, I), main(M, J), dessert(D, K). 



appetizer(radishes, 50). 
appetizer(soup, 300). 



main(beef, 1000). dessert (icecream, 600). 

main(pork, 1200). dessert (fruit, 90). 



Fig. 4.11. Lightmeal in Constraint Prolog 



enumerations of dishes). The ranges of the variables 7x,7y, 7z are specified 
for the domain and not for a problem. The function calories looks up the 
calorie value of a dish in an association list. 



Domain: Lightmeal 

variables: {7x :range (radishes, soup) 

(?t/ :range (beef, pork)) 

(?2 :range (fruit, icecream) 



Initial State: { } ; empty 

Goal: {(lightmeal(?a;, 7y, 7z), 

(calories(?x) + calories(?i/) + calories(? 2 ))< 1400} 



Functions: calories(?dis/i) = cadr(assoc(?dish, calorie-list)) 

calorie- list := ((radishes 50) (beef 1000) (fruit 90) 

(icecream 600) (soup 300) (pork 1200) ...) 



Fig. 4.12. Problem Specification for Lightmeal 



There are no operators specified for this example. The planning process 
in this case solves the goal constraint by finding those combinations of dishes 
that satisfy the constraint. For example a meal consisting of radishes with 50 
calories, beef with 1000 calories and fruit with 90 calories. 
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The first part of anything is usually easy. 

— Travis McGee in: John D. MacDonald, The Scarlet Ruse, 1973 



As mentioned before, DPlan is intended as a tool to support the generation of 
finite programs. That is, plan generation is for small domains - with three or 
four objects - only. Generation of a universal plan covering all possible states 
of a domain, as we do with DPlan, is necessarily a complex problem because 
for most interesting domains the number of states grows exponentially with 
the number of objects. In the following, we first discuss what extensions and 
modifications would be needed to make DPlan competitive with state of the 
art planning systems. Afterwards, we discuss extensions which would improve 
DPlan as a tool for program synthesis. Finally, we discuss some relations to 
human problem solving. 



5.1 Comparing DPlan with the State of the Art 

Effort of universal planning is typically higher than the average effort of 
standard state-based planning because it is based on breadth-first search. 
But, if one is interested in generating optimal plans, this price must be payed. 
Typically, universal planning terminates if the initial state is reached in some 
pre-image (see sect. 2.4.4 in chap. 2). In contrast, DPlan terminates after all 
states which can be transformed into the goal state are covered. Consequently, 
it is not possible to make DPlan more time efficient - for a domain with 
exponential growth an exponential number of steps is needed to enumerate 
all states. But DPlan could be made more memory efficient by representing 
plans as OBDDs (Jensen & Veloso, 2000). 

To be applied to standard planning problems, it would be necessary to 
modify DPlan. The main problem to overcome would be to generate complete 
state descriptions by backward operator application from a set of top-level 
goals, that is, from an incomplete state description (see sect. 3.1 in chap. 3). 
Some recent work by Wolf (2000) deals with that problem: Complete state 
descriptions are constructed as maximal sets of atoms which are not mutually 
excluded by the operator definitions of the domain. 
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Furthermore, DPlan currently terminates with the universal plan. For 
standard planning, a functionality for extracting a linear plan for a fixed 
state must be provided. Plan extraction can be done for example by depth- 
first search in the universal plan (Schmid, 1999). 

In the context of planning and control rule learning (see sect. 2.5.2 in 
chap. 2), we argued that planning can profit from program synthesis: First, 
a universal plan is constructed for a domain with a small number of objects, 
then a recursive rule for the complete domain (with a fixed goal, such as “build 
a tower of sorted blocks”) is generalized. Afterwards, planning can be omitted 
and instead the recursive rule is applied. To make this argument stronger, it is 
important to provide empirical evidence. We must show that plan generation 
by applying the recursive rule results in correct and optimal plans and that 
these plans are calculated in significantly less time than with a state-of-the- 
art planner. For that reason, DPlan must be extended such that (1) learned 
recursive rules are stored with a domain, and (2) for a new planning problem, 
the appropriate rule is extracted from memory and applied.^ 



5.2 Extensions of DPlan 

To make it possible that DPlan can be applied to as large a set of problems 
which are of interest in program synthesis as possible, DPlan should work 
for a representation language with expressive power similar to PDDL (see 
sect. 2.2.1 in chap. 2). We already made an important step in that direction 
by introducing function application (chap. 4). The extension of the Strips 
planning language to function application has the following characteristics: 
Planning operators can be defined using arbitrary symbolical and numerical 
functions; ADD/DEL effects and updates can be combined; indirect reference 
to objects via function application allows for infinite domains; planning with 
resource constraints can be handled as special case. 

The proposed language FPlan can be used to give function application in 
PDDL (McDermott, 1998b) a clear semantics. The described semantics of op- 
erator application can be incorporated in arbitrary Strips planning systems. 
In contrast to other proposals for dealing with the manipulation of state vari- 
ables, such as ADL (Pednault, 1994), we do not represent them as specially 
marked “first class objects” (declared as fluents in PDDL) but as arguments 
of relational symbols. This results in a greater flexibility of FPlan, because 
each argument of a relational symbol may principally be changed by function 
application. As a consequence, we can model domains usually specified by 
operators with ADD /DEL effects alternatively by updating state variables 
(Geffner, 2000). 



^ Preliminary work for control rule application was done in a student project at 
TU Berlin, 1999. 
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Work to be done includes detailed empirical comparisons of the efficiency 
of plan construction with operators with ADD/DEL effects versus updates 
and providing proofs of correctness and termination for planning with FPlan. 
FPlan currently has no restrictions to what functions can be applied. As a 
consequence, termination is not guaranteed. That is, we have to provide some 
restrictions on function application. 

Currently, DPlan works on Strips extended by conditional effects and 
function application. For future extensions, we plan to include typing, nega- 
tion in pre-conditions, and quantification. These extensions allow to generate 
plans for a larger class of problems. On the other hand, as already discussed 
for function application, each language extension results in a higher complex- 
ity for planning. While PDDL offers the syntax for all mentioned extensions, 
up to now only a limited number of work proposes an algorithmic realiza- 
tion (Penberthy & Weld, 1992; Koehler et ah, 1997) and careful analyses of 
planning complexity are necessary (Nebel, 2000). 



5.3 Universal Planning versus Incremental Exploration 

We argue, that for learning a recursive generalization for a domain, the com- 
plete structure of this domain must be known. Universal planning with DPlan 
results in a DAG which represents the shortest action sequences to transform 
each possible state over a fixed number of objects into a goal state. In chapter 
8 we will show that from the structure of the DAG additionally information 
about the data type underlying this domain can be inferred which is essential 
for transforming the plan in a program term fit for generalization. Our ap- 
proach is “all-or-nothing” - that is, either a generalization can be generated 
for a given universal plan or it cannot be generated and if a generalization 
can be generated it is guaranteed to generate correct and optimal action se- 
quences for transforming all possible states into a goal state. A restriction of 
our approach is, that the planning goal must also be a generalization of the 
goal for which the original universal plan was generated (clear a block in a 
n block tower, transport n objects from A to B, sort a list with n elements, 
build a tower of n sorted blocks, solve Tower of Hanoi for n discs). 

Other approaches to learning control rules for planning, namely learning 
in Prodigy (Veloso et ah, 1995) and Geffner’s approach (Martin & Geffner, 
2000), on the one hand offer more flexibility but on the other hand can- 
not guarantee that learning in every case results in a better performance (see 
sect. 2.5.2 in chap. 2): The system is exposed incrementally to arbitrary plan- 
ning experience. For example, in the blocks- world domain, the system might 
be first confronted with a problem “find an action sequence for transforming 
a state where all given four blocks A, B, C, and D are lying on the table 
into one where i? is on A and C is on D; and next with a problem “find 
an action sequence for transforming a state where all given six blocks which 
are stacked in reverse order into one where blocks A to D are stacked into 
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an ordered tower and E and F are lying on the table”. Rules generalized 
from this experience might or might not work for new problems. For each 
new problem, the learned rules are applied and if rule application results in 
failure the system must extend or modify its learning experience. 

This incremental approach to learning is similar to the models of hu- 
man skill acquisition as, for example, proposed by Anderson (1983) in his 
ACT theory, although these approaches address only the acquisition of linear 
macros (see sect. 2.5.2 in chap. 2). To get some hints whether our universal 
planning approach has plausibility for human cognition, the following empir- 
ical study could be conducted: For a given problem domain, such as Tower of 
Hanoi, one group of subjects is confronted with all possible constellations of 
the three-disc problem and one group of subjects is confronted with arbitrary 
constellations of problems with different numbers of discs. Both groups have 
to solve the given problems. Afterwards, performance on arbitrary Tower of 
Hanoi problems is tested for both groups and both groups are asked after 
the general rule for solving Tower of Hanoi. We assume that explicit gener- 
ation of all action sequences for a problem of fixed size, facilitates detection 
of the structure underlying a domain and thereby the formation of a general 
solution principle. 



Part II 

Inductive Program 
Synthesis 




6. Automatic Programming 



”So much is observation. The rest is deduetion. ” 

— Sherlock Holmes to Watson in: Arthur Connan Doyle, The Sign of Four, 1930 



Automatic programming is investigated in artificial intelligence and software 
engineering. The overall research goal in automatic programming is to autom- 
atize as large a part of the development of computer programs as possible. 
A more modest goal is to automatize or support special aspects of program 
development - such as program verification or generation of high-level pro- 
grams from specifications. The focus of this chapter is on program generation 
from specifications - referred to as automatic program construction or pro- 
gram synthesis. There are two distinct approaches to this problem: Deduc- 
tive program synthesis is concerned with the automatic derivation of correct 
programs from complete, formal specifications; inductive program synthesis 
is concerned with automatic generalization of (recursive) programs from in- 
complete specifications, mostly from input/output examples. In the following, 
we first (sect. 6.1) give an introductory overview of approaches to automatic 
programming, together with pointers to literature. Afterwards (sect. 6.2), we 
shortly review theorem proving and transformational approaches to deduc- 
tive program synthesis. Since our own work is in the context of inductive 
program synthesis, we will go into more detail, presenting this area of re- 
search (sect. 6.3): In section 6.3.1, the foundations of automatic induction - 
that is, of machine learning ~ are introduced; grammar inference is discussed 
as theoretical basis of inductive synthesis. Afterwards, genetic programming 
(sect. 6.3.2), inductive logic programming (sect. 6.3.3), and the synthesis of 
functional programs (sect. 6.3.4) are presented. Finally, we discuss deductive 
versus inductive and functional versus logical program synthesis (sect. 6.4). 
Throughout the text we give illustrations using simple programming prob- 
lems. 
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6.1 Overview of Automatic Programming Research 

6.1.1 AI and Software Engineering 

Software engineering is concerned with providing methodologies and tools 
for the development of software (computer programs). Software engineering 
involves at least the following activities: 

Specification: Analysis of requirements and the desired behavior of the pro- 
gram and designing the internal structure of the program. Design speci- 
fication might be stepwise refined, from an overall structure of software 
modules to the (formal) specification of algorithms and data structures. 
Development: Realizing the software design in executable program code (pro- 
gramming, implementation). 

Validation: Ensuring that the program does what it is expected to do. One 
aspect of validation ~ called verification - is that the implemented pro- 
gram realizes the specified algorithms. Program verification is realized 
by giving (formal) proofs that the program fulfills the (formal) specifica- 
tion. A verified program is called correct with respect to a specification. 
The second aspect of validation is that the program meets the initially 
specified requirements. This is usually done by testing. 

Maintenance: Fixing program errors, modifying and adding features to a 
program (updating). 

Software products are expected to be correct, efficient, and transparent. 
Furthermore, programs should be easily modifiable, maintaining the qual- 
ity standards correctness, efficiency, and transparency. Obviously, software 
development is a complex task and since the eighties a variety of computer- 
aided software engineering (CASE) tools have been developed - supporting 
project management, design (e. g., checking the internal consistency of mod- 
ule hierarchies), and code generation for simple routine tasks (Sommerville, 
1996). 

6. 1.1.1 Knowledge-Based Software Engineering. A more ambitious 
approach is knowledge-based software engineering (KBSE) . The research goal 
of this area is to support all stages of software development by “intelligent” 
computer-based assistants (Green et ah, 1983). KBSE and automatic pro- 
gramming are often used as synonyms. 

A system providing intelligent support for software development must in- 
clude several aspects of knowledge (Smith, 1991; Green & Barstow, 1978): 
General knowledge about the application domain and the problem to be 
solved, programming knowledge about the given domain, as well as gen- 
eral programming knowledge about algorithms, data structures, optimization 
techniques, and so on. AI technologies, especially for knowledge representa- 
tion, automated inference, and planning, are necessary to build such systems. 
That is, an KBSE system is an expert system for software development. 
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The lessons learned from the limited success of expert systems research in 
the eighties, also apply to KBSE: It is not realistic to demand that a single 
KBSE system might cover all aspects of knowledge and all stages of software 
development. Instead, systems typically are restricted in at least one of the 
following ways (Flener, 1995; Rich & Waters, 1988): 

~ Constructing systems for expert specifiers instead of end-users. 

~ Restricting the system to a narrow domain. 

~ Providing interactive assistance instead of full automatization. 

— Focusing on a small part of the software development process. 

Examples for specialized systems are Aries (Johnson & Feather, 1991) for 
specification acquisition, KIDS (Smith, 1990) for program synthesis from 
high-level specifications (see sect. 6. 2. 2. 3), or PVS (Bold, 1995) for program 
verification. Automatic programming is mainly an area of basic research. Up 
to now, only a small number of systems are applicable to real-world software 
engineering problems. 

Below, we will introduce approaches to program synthesis, that is KBSE 
systems addressing the aspect of code generation from specifications, in detail. 
From a software engineering standpoint, the main advantage of automatic 
code generation is, that ex-post verification of programs becomes obsolete if 
it is possible to automatically derive (correct) programs from specifications. 
Furthermore, the problems of program modification can be shifted to the 
more abstract - and therefore hopefully more transparent - level of specifi- 
cation. 

6. 1.1.2 Programming Tutors. KBSE research aims at providing systems 
that support expert programmers. Another area of research, where AI and 
software engineering interact, is the development of tutor systems for support 
and education of student programmers. Such tutoring systems must incorpo- 
rate programming knowledge to a smaller extend than KBSE systems, but 
additionally, they have to rely on knowledge about efficient teaching meth- 
ods. Furthermore, user-modeling is critical for providing helpful feedback for 
programming errors. 

Examples for programming tutors are the Lisp-tutors from Anderson, 
Conrad, and Corbett (1989) and Weber (1996) and the tutor Proust from 
Johnson (1988). Anderson’s Lisp-tutor is based on his ACT theory (Ander- 
son, 1983). The tutoring strategy is based on the assumption that program- 
ming is a cognitive skill, based on a growing set of production rules. Training 
focuses on acquisition of such rules (“IF the goal is to obtain the first ele- 
ment of a list I THEN write {car /)”), and errors are corrected immediately 
to avoid that students acquire faulty rules. Error-recognition and feedback 
is based on a library of expert rules and rules which result in programming 
errors. The later were acquired in a series of empirical studies of novice pro- 
grammers. A faulty program is tried to be reproduced by applying rules from 
the library and feedback is based on the rules which generated the errors. 
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Weber’s tutor is based on episodic user modeling. The programs a student 
generates over a curriculum are stored as schemes and in a current session 
the stored programming episodes are used for feedback. The Proust system 
is based on the representation of plans (corresponding to high-level specifica- 
tions of algorithms). Ideally, more than one correct program can be derived 
from a correct plan. Errors are detected by trying to identify the faulty plan 
underlying a given program. 

Programming tutors were mainly developed during the eighties. Simulta- 
neously, novice programmers were studied intensively in cognitive psychology 
(Widowski & Eyferth, 1986; Mayer, 1988; Soloway & Spohrer, 1989). One rea- 
son for this interest was that this area of research offered an opportunity to 
bring theories and empirical results of psychology to application; a second 
reason was that programming offers a relatively narrow domain which does 
not strongly depend on previous experience in other areas to study the de- 
velopment of human problem solving skills (Schmid, 1994; Schmid & Kaup, 
1995). 

6.1.2 Approaches to Program Synthesis 

Research in program synthesis addresses the problem of automatic generation 
of program code from specifications. The nature of this problem depends on 
the form in which a specification is given. In the following, we first introduce 
different ways to specify programming problems. Afterwards we introduce 
different synthesis methods. Synthesis methods can be divided in two classes 
- deductive program synthesis from complete formal specifications and in- 
ductive program synthesis from incomplete specifications. Therefore, we first 
contrast deductive and inductive inference, and then characterize deductive 
and inductive program synthesis as special cases of these inference meth- 
ods. Finally, we will mention schema-based and analogy-based approaches as 
extensions of deductive and inductive synthesis. 

6. 1.2.1 Methods of Program Specification. Table 6.1 gives an illustra- 
tion of possible ways in which a program for returning the last element of a 
non-empty list can be specified. 

If a specification is given just as an informal (natural language) descrip- 
tion of requirements (Green & Barstow, 1978), we are back at the ambitious 
goal of automatic programming - often ironically called “automagic pro- 
gramming” (Rich & Waters, 1986a, p. xi). Besides the problem of automatic 
natural language understanding in general, the main difficulty of such an ap- 
proach would be to derive the required information - for example, the desired 
input/output relation - from an informal statement. For the last specification 
in table 6.1, the synthesis system must have knowledge about lists, what it 
means that a list is not empty, what “element of a list” refers to and what is 
meant by “last”. 
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Table 6.1. Different Specifications for Last 



Informal Specification 

Return the last element of a non-empty list. 




Declarative Programs 
lastC[X]) X. 

lastC[XlT]) last(T) . 


(logic program, 
Prolog) 


fun last(l) = 

if null(tKl)) then hd(l) else lastCtl(l)); 


functional program 
(ML) 


fun last (x::nil) = x 
1 last (x :: rest) = last (rest); 


(ML with 
pattern matching) 


Complete, Formal Specifications 

last(l) ■<= find z such that for some y, 1 — y o [zj 
where islist(l) and 1 ^ [ ] 


(Manna & Waldinger, 
1992, p. 4) 


last : seq_l X — > X 

forall s : seq_l X • last s = s(#s) 


(Z) 

(Spivey. 1992, p. 117) 


Incomplete Specifications 

last([l], 1), last ([2 5 6 7], 7), last ([9 3 4], 4) 


(I/O Pairs, logical) 


last([l]) = 1, last C [2 7]) =7, last ([5 3 4]) =4 


(I/O Pairs, functional, 
first 3 inputs) 


last([xi]) = xi , last([xi X 2 D = X 2 , last([xi X 2 213 ]) = x^ 


(Generic I/O Pairs) 


last([l 2 3])^ last ([2 3]) ^ last ([3]) ^ 3 


(Trace) 


last([l]) = if nullCtl(l)) then hd(l) else if null (tl(tl (1) ) ) 
then hd(tl(D) else if null (tl (tl (tl (1) ) ) ) then hd(tl (tl (1) ) ) 


(Complete 
generic trace) 



If a specification is given as a complete formal representation of an al- 
gorithm, we have a more realistic goal, which is addressed in approaches of 
deductive program synthesis. Both example specifications in table 6.1 give 
a non-operational description of the problem to be solved. The specification 
given by Manna and Waldinger (1992) states that for a non-empty list I there 
must be a sub-list y such that y concatenated with the searched for element z 
is equal to 1. The Z specification (Spivey, 1992) states that for all non-empty 
sequences s (defined by segi X) last returns the element on the last position 
of this list (s(#s)) where #s gives the length of s. 

The distinction between “very high level” specification languages - such 
as Gist or Z (Potter, Sinclair, & Till, 1991) - and “high level” programming 
languages is not really strict: What is seen as specification language today 
might be a programming language in the future. In the fifties, assembler lan- 
guages and the first compiler language Fortran where classified as automatic 
programming systems (see Rich & Waters, 1986a, p. xi, for a reprint from 
Communications of the ACM, Vol. 1(4), April 1958, p. 8). 

Formal specification languages as well as programming languages allow to 
formulate unambiguous, complete statements because they have a clearly de- 
fined syntax and semantics. In general, a programming language guarantees 
that all syntactically correct expressions can be transformed automatically in 
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machine-executable code, while this must not be true for a specification lan- 
guage. Specification languages and high-level declarative programming lan- 
guages (i. e., functional and logic languages) share the common characteristic, 
that they abstract from the “how to solve” a problem on a machine and focus 
on the “what to solve” , that is, they have more expressive power and provide 
much abstracter constructs than typical imperative programming languages 
(like C). Nevertheless, generating specifications or declarative programs re- 
quires experts who are trained in representing problems correctly and com- 
pletely - that is, giving the desired input/output relations and covering all 
possible inputs. 

For that reason, inductive program synthesis addresses the problem to 
derive programs from incomplete specifications. The basic idea is, that a user 
presents some examples of the desired program behavior. Specification by 
examples is incomplete, because the synthesis algorithm must generalize the 
desired program behavior from some inputs to all possible inputs. Further- 
more, the examples themselves can contain more or less information. Ex- 
amples might be presented simply as input/ output pairs. For structural list- 
problems (as last or reverse), generic input/output pairs, abstracting from 
concrete values of list elements can be given. This makes the examples less 
ambiguous. 

More information about the desired program can be provided, if examples 
are represented as traces which illustrate the operation of a program. Traces 
can prescribe the to be used data structures and operations, if the inputs are 
described by tests and outputs by transformations of the inputs using pre- 
defined operators. Traces are called complete, if all information about data 
structures changed, operators applied, and control decisions taken is given. 
Additionally, examples are more informative, if they indicate what kind of 
computations are not desired in the goal program. This can be done by pre- 
senting positive and negative examples. Another strategy is to present exam- 
ples for the first k inputs, thereby defining an order over the input domain. 
Of course, the more information is presented by an example specification, 
the more the system user has to know about the structure of the desired 
program. 

The kind and number of examples must suffice to specify what the desired 
program is supposed to calculate. Sufficiency is dependent on the synthesis 
algorithm which is applied. In general, it is necessary to present more than 
one example, because otherwise a program with constant output is the most 
parsimonious induction. 

6. 1.2.2 Synthesis Methods. Deductive and inductive synthesis are special 
cases of deductive and inductive inference. Therefore, we first give a general 
characterization of deduction and induction: 

Deductive versus Inductive Inference. Deductive inference addresses the prob- 
lem of inferring new facts or rules (called theorems) from a set of given facts 
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and rules (called theory or axioms) which are assumed to be true. For exam- 
ple, from the axioms 

1. list(l) — > null(cons(x, 1)) 

2. list([A,B,C]) 

we can infer theorem 

3. -I null(cons(Z, [A,B,C])). 

Deductive approaches are typically based on a logical calculus. That is, 
axioms are represented in a formal language (such as first order predicate 
calculus) and inference is based on a syntactical proof mechanism (such as 
resolution). An example for a logical calculus was given in section 2.2.2 in 
chapter 2 (situation calculus). The theorem given above can be proved by 
resolution in the following way: 

1. null(cons(Z, [A,B,C])) (Negation of the theorem) 

2. -1 list(l) V -1 null(cons(x, 1)) (axiom 1) 

3. -1 list([A,B,C]) (Resolve 1, 2 with substitutions a = {z/Z, i/[A, B, C]}) 

4. list([A,B,C]) (axiom 2) 

5. contradiction (Resolve 3, 4). 

Inductive inference addresses the problem of inferring a generalized rule 
which holds for a domain (called hypothesis) from a set of observed instances 
belonging to this domain (called examples). For example, from the examples 

- list([Z,A,B,C]) 

- list([l,2,3j) 

- list([R,B]) 

axiom (1) given above might be inferred as hypothesis for the domain of (flat) 
lists. For this inference, it is necessary to refer to some background knowledge 
about lists, namely that lists are constructed by means of the list-constructor 
cons(x, 1) and that null(l) is true if I is the empty list and false otherwise. 
Other hypotheses which could be inferred are 

- list(l) 

which is an over-generalization because it includes objects which are no lists 
(i. e., atoms) or 

- I G {[Z, A, B,C], [1,2, 3], [R, B]} ^list{l) 

which is no generalization but just a compact representation of the examples. 

Inductive approaches are realized within the different frameworks offered 
by machine learning research. The language for representing hypotheses de- 
pends on the selected framework. Examples are decision trees, a subset of 
predicate calculus, or functional programs. Also depending on the framework, 
hypothesis construction can be constrained more or less by the presented 
examples. In extreme, hypotheses might be generated just by enumeration 
(Gold, 1967): a legal expression of the hypothesis language is generated and 
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tested against the examples. If the hypothesis is not consistent with the ex- 
amples, it is rejected and a new hypothesis is generated. Note, that testing 
whether a hypothesis holds for an example corresponds to deductive inference 
(Mitchell, 1997, p. 291). An overview of machine learning is given by Mitchell 
(1997) and we will present approaches to inductive inference in section 6.3. 

Deduction guarantees that the inference is correct (with respect to the 
given axioms), while induction only results in a hypothesis. From the perspec- 
tive of epistemology, one might say that a deductive proof does not generate 
knowledge - it explicates information which is already contained in the given 
axioms. Inductive inference results in new information, but the inference has 
only hypothetical status which holds as long as the system is not confronted 
with examples which contradict the inference. 

To make this difference more explicit, lets look at a set-theoretic inter- 
pretation of inference: 

Definition 6.1.1 (Relations in a Domain). LetV he a set of objects be- 
longing to a domain. A relation with arity i is given as Ej f-T>^ . The set of 
all relations which hold in a domain is TZ. 

For a given domain, for example the set of flat lists, the set of all axioms 
which hold in this domain is extensionally given by relations. For example, 
there might be an unary relation R containing all lists, and a binary relation 
R' containing all pairs of lists (I, cons(x, 1)) where I € R. In general, relations 
can represent formulas of arbitrary complexity. 

Now we can define (the semantics of) deduction and induction in the 
following way: 

Definition 6.1.2 (Deduction). Given a set of axioms A C TZ, deductive 
inference means to decide whether for a theorem R ^ A holds R £ TZ. 

In a deductive (proof) calculus, R £ TZ is decided by showing that R can 
be derived from the given axioms A, that is, A \= R. A standard syntactical 
approach to deductive inference is for example resolution (see illustration 
above). 

Definition 6.1.3 (Induction). Given a set of examples £ G TZ, inductive 
inference means to search a hypothesis Ti = {i?i, . . .Rn\ such that Ti ^ £ 
and y Ei £ £ ■. Ei £ TL. 

The search for a hypothesis takes place in the set of possible hypotheses given 
a fixed hypotheses language, for example, the set of all syntactically correct 
Lisp programs. Typically, selection of a hypothesis is not only restricted by 
demanding that Ti explains (“covers”) all presented examples, but by addi- 
tional criteria such as simplicity. 
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Deductive versus Inductive Synthesis. In program synthesis, the result of in- 
ference is a computer program, which transforms all legal inputs x in the 
desired output y = f{x). For deductive synthesis, the complete formal speci- 
fication of the pre- and post-conditions of the desired program can be repre- 
sented as theorem of the following form: 

Definition 6.1.4 (Program Specification as Theorem). 

V X 3 ?/ \Pre{x) Post{x,y)]. 

For example, for the specification of the Zast-program (see tab. 6.1), Pre(x) 
states that x is a non-empty list and Post(x, y) states, that y is the last 
element of list x. A constructive theorem prover (such as Green’s approach, 
described in sect. 2.2.2 in chap. 2) tries to prove that this theorem holds. If 
the proof fails, there exists no feasible program (given the axioms on which 
the proof is based); if the proof succeeds, a program /(x) is returned as result 
of the constructive proof. For /(x) must hold 

Definition 6.1.5 (Program Specification as Constructive Theorem). 

V X [Pre(x) Post{x, /(x))] 

In the proof the existential quantified variable y must be explicitly con- 
structed as a function over x (Skolemization). 

An alternative to theorem proving is program transformation. In theorem 
proving, each proof step consists of rewriting the current program (formula) 
by selecting a given axiom and applying the proof method (e. g., resolution). 
In program transformation, rewriting is based on transformation rules. 

Definition 6.1.6 (Transformation Rule). A transformation rule is de- 
fined as a conditioned rule: 

If Application-Condition Then Left-Hand-Pattern — > Right-Hand-Pattern. 

The rules might be given in equational logic (i. e., a special class of ax- 
ioms) or they might be uni-directional. Each transformation step consists of 
rewriting the current program by replacing some part which matches with 
the Left- Hand- Pattern by another expression which matches with the Right- 
Hand-Pattern. 

In section 6.2 we will present theorem proving as well as transformational 
approaches to deductive synthesis. 

For inductive synthesis, a program is constructed as generalization over 
the given incomplete specification (input/output examples or traces, see 
tab. 6.1) such that the following proposition holds: 

Definition 6.1.7 (Characteristic of an Induced Program). 



V (x,y) e £[f{x) = y] 
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where £ is a set of examples with x as possible input in the desired program 
and y as corresponding output value or trace for x. 

The formal foundation of inductive synthesis is grammar inference. Here 
the examples are viewed as words which belong to some unknown formal 
language and the inference task is to construct a grammar (or automaton) 
which generates (or recognizes) a language to which the example words be- 
long. The classical approach to program synthesis is the construction of re- 
cursive functional (mostly Lisp) programs from traces. Such traces are either 
given as input our automatically constructed from input/output examples. 
Alternatively, in the context of inductive logic programming, the construction 
of logical (Prolog) programs from input/output examples is investigated. In 
both research areas, there are methods which base hypothesis construction 
strongly on the presented examples versus methods which depend more on 
search in hypothesis space (i. e., generate and test). An approach which is 
based explicitly on generate and test is genetic programming. 

In section 6.3 we will present grammar inference, functional program syn- 
thesis, inductive logic programming, and genetic programming as approaches 
to inductive program synthesis. 

Enriching Program Synthesis with Knowledge. The basic approaches to de- 
ductive and inductive program synthesis depend on a limited amount of 
knowledge: a set of (correct) axioms or transformation rules in deduction; or a 
restricted hypothesis language and some rudimentary background knowledge 
(for example about data types and primitive operators) in induction. These 
basic approaches can be enriched by additional knowledge. Such knowledge 
might be not universally valid but can make a system more powerful. As a 
consequence, a basic fully automatic, algorithmic approach is extended to an 
interactive, heuristic approach. 

A successful approach to knowledge-based program synthesis is to guide 
synthesis by pre-defined program schemes, also called design strategies. A 
program scheme is a program template (for example represented in higher 
order logic) representing a fixed over-all program structure (i. e., a fixed flow 
of control) but abstracting from concrete operations. For a new programming 
problem, an adequate scheme is selected (by the user) from the library and a 
program is constructed by stepwise refinement of this scheme. For example, 
a quicksort program might be synthesized by refining a divide- and- conquer 
scheme. Scheme-based synthesis is typically realized within the deductive 
framework (Smith, 1990). An approach to combine inductive program syn- 
thesis and schema-refinement is proposed by Flener (1995). 

A special approach to inductive program synthesis is programming by 
analogy, that is, program reuse. Here, already known programs (predefined 
or previously synthesized) are stored in a library. For a new programming 
problem, a similar problem and the program which solves this problem are 
retrieved and the new program is constructed by modifying the retrieved 



6.1 Overview of Automatic Programming Research 109 



program. Analogy can be seen as a special case of induction because modifi- 
cation is based on the structural similarity of both problems and structural 
similarity is typically obtained by generalizing over the common structure of 
the problems (see calculation of least general generalizations in section 6.3.3 
and anti-unification in chapter 7 and chapter 12). 

We will come back to schema-based synthesis and programming by anal- 
ogy in Part III, addressing the problems of (1) automatic acquisition of ab- 
stract schemes from example problems, and (2) automatical retrieval of an 
appropriate schema for a given programming problem. 

6.1.3 Pointers to Literature 

Although program synthesis is an active area of research since the begin- 
ning of computer science, program synthesis is not covered in text books on 
programming, software engineering, or AI. The main reason for that omis- 
sion might be that program synthesis research does not rely on one common 
formalism but that each research group proposes its own approach. This is 
even the case within a given framework - as synthesis by theorem proving 
or inductive logic programming. Furthermore, the formalisms are typically 
quite complex, such that it is difficult to present a formalism together with 
a complete, non-trivial example in a compact way. 

An introductory overview to knowledge-based software engineering is 
given by Lowry and Duran (1989). Here approaches to and systems for spec- 
ification acquisition, specification validation and maintenance, as well as to 
deductive program synthesis are presented. A collection of influential pa- 
pers in AI and software engineering was edited by Rich and Waters (1986b). 
The papers cover a large variety of research areas including deductive and 
inductive program synthesis, program verification, natural language specifi- 
cation, knowledge based approaches, and programming tutors. Collections of 
research papers on AI and software engineering are presented by Lowry and 
McCarthy (1991), Partridge (1991), and Messing and Campbell (1999). 

An introductory overview to program synthesis is given by (Barr & Feigen- 
baum, 1982). Here the classical approaches to and systems for deductive and 
inductive program synthesis of functional programs are presented. A col- 
lection of influential papers in program synthesis was edited by Biermann, 
Guiho, and Kodratoff (1984), addressing deductive synthesis, inductive syn- 
thesis of functional programs, and grammar inference. A more recent survey 
of program synthesis is given by Flener (1995, part 1) and a survey of induc- 
tive logic program synthesis is given by Flener and Yilmaz (1999). 

Current research in program synthesis is for example covered in the 
journal Automated Software Engineering. AI conferences, especially machine 
learning conferences (ECML, ICML) typically have sections on inductive pro- 
gram synthesis. 
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6.2 Deductive Approaches 

In the following we give a short overview of approaches to deductive program 
synthesis. First we introduce constructive theorem proving and afterwards 
program transformation. 

6.2.1 Constructive Theorem Proving 

Automated theorem proving has in general not to be constructive. That 
means, for example, that from a statement ->ix^P{x) can be followed that 
3xP{x) without the necessity to actually construct an object a which has 
property P. In contrast, for a proof to be constructive, such an object a 
has to be given together with a proof that a has property P (Thompson, 
1991). Classical theorem provers, which are applied for program verification, 
such as the Boyer-Moore theorem prover (Boyer & Moore, 1975) cannot be 
used for program synthesis because they cannot reason constructively about 
existential quantifications. As introduced above, a constructive proof of the 
statement \/x3yPre{x) Post{x, y) means that a program / must be con- 
structed such that for all inputs a, for which Pre{a) holds, Post{x, f{x)) 
holds. 

The observation that constructive proofs correspond to programs was 
made by a lot of different researchers (e. g., Bates & Constable, 1985) and 
was explicitly stated as so-called Curry-Howard isomorphism in the eight- 
ies. One of the oldest approaches to constructive theorem proving was pro- 
posed by Green (1969), introducing a constructive variant of resolution. In 
the following, we first present this pioneering work, afterwards we describe 
the deductive tableau method of (Manna & Waldinger, 1980), and give a 
short survey of more contemporary approaches. 

6. 2. 1.1 Program Synthesis by Resolution. Green (1969) introduced a 
constructive version of clausal resolution and demonstrated with his system 
QA3 that constructive theorem proving can be applied for constructing plans 
(see sect. 2.2.2 in chap. 2) and for automatic (Lisp) programming. For auto- 
matic programming, the theorem prover needs two sets of axioms: (1) Axioms 
defining the functions and constructs of (a subset of) a programming language 
(such as Lisp), and (2) axioms defining an input/output relation R(x, y)^, 
which is true if and only if x is an appropriate input for some program and y 
is the corresponding output generated by this program. Green addresses four 
types of automatic programming problems: 

Ghecking: Proving that R(a,b) is true or false. 

For a fixed input/output pair it is checked whether a relation R(a, b) 

holds. 



^ Relation R(x,y) corresponds to relation Post(x,y) given in definition 6.1.4. 
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Simulation: Proving that 3x R{a, x) is true (returning a: = 6) or false. 

For a fixed input value a, output b is constructed. 

Verification: Proving that Vx R{x,f{x)) is true or false (returning x = c as 
counter-example) . 

For a user-constructed program /(x), its correctness is checked. For 
proofs concerning looping or recursive programs, induction axioms are 
required to proof convergence (termination) . 

Synthesis: Proving that Vx 3y i?(x, y) is true (returning the program y = 
/(x)) or false (returning x = c as counter-example). 

Induction axioms are required to synthesize recursive programs. 

Green gives two examples for program synthesis: the synthesis of a simple 
non-recursive program which sorts two numbers of a dotted pair, and the 
synthesis of a recursive function for sorting. 

Synthesis of a Non-Recursive Function. For the dotted pair problem, the 
following axioms concerning Lisp are given: 

1. X = car ( cons (x,y)) 

2. y = cdr(cons(x,y)) 

3. X = nil — > cond(x,y,z) = z 
4- X ^ nil — > cond(x,y,z) = y 

5. Vx,?/ [lessp(x,y) ^ nil ^ x < y]. 

The axiom specifying the desired input/output relation is given as 



Vx3?/ [car(x) < cdr{x) y = x]f\ 

[{car{x) > cdr{x)) (car(x) = cdr(y) A cdr{x) = car{y))]. 

The axioms states that a dotted pair (X 1 .X 2 ) which is already sorted (xi < X 2 ) 
is just returned, otherwise, the reversed pair must be returned. By resolving 
the input/output axioms with axiom (5), the mathematical expressions (<, 
<) are replaced by Lisp-expressions lessp. Resolving expressions lessp(x,y) = 
nil with axioms (3) and (4) introduces conditional expressions; and axioms 
(1) and (2) are used to introduce cons-expressions. The resulting program is 
y= cond(lessp(car(x),cdr(x)), x, cons(cdr(x),car(x))). 

Synthesis of a Recursive Function. To synthesize a recursive program the 
theorem prover additionally requires an induction axiom. For example, for a 
recursion over finite linear lists, it can be stated that the empty list is reached 
in a finite number of steps by stepwise applying cdr(l): 

Definition 6.2.1 (Induction over Linear Lists). 

[P{h{nil)) A Vx [^atom(x) A P{h{cdr{x))) P{h{x))]] \/z P{h{z)) 

where P is a predicate and h is a function. 



112 6. Automatic Programming 



The axiom specifying the desired input/output relation for a list-sorting 
function can be stated for example as: 



Vx 3y [R{nil,y) A \\^atom{x) A R{cdr{x) , sort{cdr{x)))] — > R{x,y)]]. 

The searched for function is already named as sort. 

Given the Lisp axioms above, some additional axioms characterizing the 
predicates atom(x) and equal(x,y), and axioms specifying a pre-defined func- 
tion merge (x,l) (inserting number x in a sorted list I such that the result is a 
sorted list), the following function can be synthesized: 

y = cond(equal(x,nil), nil, merge(car(x),sort(cdr(x)))). 

Drawbacks of Theorem-Proving. For a given set of axioms and a complete, 
formal specification, program synthesis by theorem-proving results in a pro- 
gram which is correct with respect to the specification. The resulting program 
is constructed as side-effect of the proof by instantiation of variable y in the 
input/output relation R{x,y). 

Fully automated theorem proving has several disadvantages as approach 
to program synthesis: 

— The domain must be axiomatized completely. 

The given axiomatization has a strong influence on the complexity of search 
for the proof and it determines the way in which the searched for program 
is expressed. For example. Green (1969) reports that his theorem prover 
could not synthesize sort in a “reasonable amount of time” with the given 
axiomatization. Given a different set of axioms (not reported in the paper), 
QA3 created the program cond(x, merge(car(x),sort(cdr(x))), nil). 
Providing a set of axioms which are sufficient to prove the input/output 
relation and thereby to synthesize a program, presupposes that a lot is 
already known about the program. For example, to synthesize the sort 
program, an axiomatization of the merge function had to be provided. 

— It can be more difficult to give a correct and complete input/output spec- 
ification than to directly write the program. 

For the sort example given above, the searched for recursive structure is 
already given in the specification by separating the case of an empty list as 
input from the case of a non-empty list and by presenting the implication 
from the rest of a list (cdr(x)) to the list itself. 

— Theorem provers lack the power to produce proofs for more complicated 
specifications. 

The main source of inefficiency is that a variety of induction axioms must 
be specified for synthesizing recursive programs. The selection of the “suit- 
able” induction scheme for a given problem is crucial for finding a solution 
in reasonable time. Therefore, selection of the induction scheme to be used 
is often performed by user-interaction! 



6.2 Deductive Approaches 113 



This disadvantages of theorem proving are true for the original approach 
of Green and are still true for the more sophisticated approaches of today, 
which are not restricted to proof by resolution but use a variety of proof 
mechanisms (see below). The originally given hard problem of automatic 
program construction is reformulated as equally hard problem of automatic 
theorem proving (Rich & Waters, 1986a). Nevertheless, the proof-as-program 
approach is an important contribution to program synthesis: First, the ne- 
cessity of complete and formal specifications compels the program designer 
to state all knowledge involved in program construction explicitly. Second, 
and more important, while theorem proving might not be useful in isolation, 
it can be helpful or even necessary as part of a larger system. For example, 
in transformation systems (see sect. 6.2.2), the applicability of rules can be 
proved by showing that a precondition holds for a given expression. 

6. 2. 1.2 Planning and Program Synthesis with Deductive Tableaus. 

A second influential contribution to program synthesis as constructive proof 
was made by Manna and Waldinger (1980). As Green, they address not only 
deductive program synthesis but also deductive plan construction (Manna & 
Waldinger, 1987). 

Deductive Tableaus. The approach of Manna and Waldinger is also based 
on resolution. In contrast to the classical resolution (Robinson, 1965) used 
by Green, their resolution rule does not depend on axioms represented in 
clausal form. Their proposed formalism - the deductive tableaus - incor- 
porates not only non-clausal resolution but also transformation rules and 
structural induction. The transformation rules have been taken over from an 
earlier, transformation based programming system called Dedalus (Manna & 
Waldinger, 1975, 1979). 

A deductive tableau consists of three columns 

Assertions Goals Outputs 
Pre(x) 

Post(x,y) y 

with X as input variable, y as output variable, Pre(x) as precondition and 
Post(x,y) as postcondition of the searched for program. The initial tableau is 
constructed from a specification of the form (see also table 6.1) 

f{x) find y such that Post{x,y) where Pre{x). 

In general, the semantics of a tableau with assertions Ai{x) and goals 
Gj{x,y) is 



yxAi{x) A ... A An{x) 3yGi{x,y) V ... V Gn{x,y). 

A proof is performed by adding new rows to the tableau through rules of 
logical inference. A proof is successful if a row could be constructed where the 
goals column contains the expression “true” and the output column contains 
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a term which consists only of primitive expressions of the target programming 
language. 

As an example for a derivation step, we present a simplified version of the 
so called GG-resolution (omitting possible unification of sub-expressions): For 
two goals - F in column i and G in column j - a new row can be constructed 
with the goal entry 



F[P true] A G[P ^ false] 

where P is a common sub-expression of F and G. For the output column, 
GG-resolution results in the introduction of a conditional expression: 

Assertions Goals Outputs 

a > b a 

^(a > b) b 

true A ^ false if a > b then a else b 
true. 

Again, induction is used for recursion- formation: 

Definition 6.2.2 (Induction Rule). For a searched for program f{x) with 
assertion Pre{x) and goal Post(x, y) the induction hypothesis is Vm(m ^ a:) ^ 
{Pre{u) — > Post{u, f{u))) where -< is a well-founded ordering over the set of 
input data {x}. 

Recursive Plans. Originally, Manna and Waldinger applied the deductive 
tableau method to synthesize functional programs, such as reversing lists or 
finding the quotient of two integers (Manna & Waldinger, 1992). Additionally, 
they demonstrated how imperative recursive programs can be synthesized by 
deductive planning (Manna & Waldinger, 1987). As example they used the 
problem of clearing a block (see also sect. 3. 1.4.1 in chap. 3). 

The searched for plan is called makeclear(a) where a denotes a block in a 
blocks- world. Plan construction is based on proving the theorem 

VsoVodzi [Glear{so] a)] 

meaning “for an initial state sq, the block a is clear after execution of plan 

Zf . 

Among the pre-specified axioms for this problems are: 

hat-axiom: If ^ Glear(s, x) Then On(s, hat(s,x), x) 

(If block X is not clear in situation s then a block hat{s^x) is lying on 
block x in situation s where hat{s,x) is the block which lies on x in s.) 
put-table-axiom: If Glear(s, x) Then On(put(s, x, table), x, table) 

(If block X is clear in situation s then it lies on the table in a situation 
where x was put on the table immediately before.) 
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and the resulting plan (program) is 

makeclear(a) 

If Clear ( a ) 

Then A (the current situation) 

Else makeclear(hat( a ) );put(hat( a ), table ). 

The complete derivation of this plan using deductive tableaus is reported 
in Manna and Waldinger (1987). 

6. 2. 1.3 Further Approaches. A third “classical” approach to program 
synthesis by theorem proving was proposed by (Bibel, 1980). Current auto- 
matic theorem provers, such as NuprP or Isabelle^ can be in principle used 
for program synthesis. But up to now, pure theorem proving approaches can 
only be applied to small problems. Current research addresses the problem 
of enriching constructive theorem proving with knowledge-based methods ~ 
such as proof tactics or program schemes (Kreitz, 1998; Bibel, Korn, Kreitz, 
& Kurucz, 1998). 

6.2.2 Program Transformation 

Program transformation is the predominant technique in deductive program 
synthesis. Typically, directed rules are used to transform an expression into a 
syntactically different, but semantically equivalent expression. There are two 
principal kinds of transformation: lateral and vertical transformation (Rich 
& Waters, 1986a). Lateral transformation generates expressions on the same 
level of abstraction. For example, a linear recursive program might be rewrit- 
ten into a more efficient tail-recursive program. Program synthesis is realized 
by vertical transformation - rewriting a specification into a program by apply- 
ing transformation rules which represent the relationship between constructs 
on an abstract level and constructs on program level. If the starting point for 
transformation is already an executable (high-level) program, one speaks of 
transformational implementation. 

In the following, we first present the pioneering approach of Burstall and 
Darlington (1977). The authors present a small set of powerful rules for trans- 
forming given programs in more efficient programs, that is they propose an 
approach to transformational implementation. The main contribution of their 
approach is the introduction of rules for unfolding and folding (synthesiz- 
ing) recursive programs. Afterwards, we present the CIP (computer-aided 
intuition-guided programming) approach (Broy & Pepper, 1981) which fo- 
cuses on correctness-preserving transformations. CIP does not aim at full au- 
tomatization, it can be classified as a meta-programming approach (Feather, 
1987) which supports a program-developer in the construction of correct pro- 
grams. Finally, we present the KIDS-system (Smith, 1990) which is the most 

^ see http://www.cs.cornell.edu/Info/Projects/NuPrl/nuprl.html 
^ see http : / /www . cl . cam . ac . uk/Research/HVG/Isabelle/ 
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successful deductive synthesis system today. KIDS is a program synthesis sys- 
tem which transforms initial, not necessarily executable, specifications into 
efficient programs by stepwise refinement. 

6. 2. 2.1 Transformational Implementation: 

The Fold Unfold Mechanism. 



The mind is a cauldron and things bubble up and show for a moment, then 
slip back into the brew. You can’t reach down and find anything by touch. You 
wait for some order, some relationship in the order in which they appear. 
Then yell Eureka! and believe that it was a process of cold, pure logic. 

— Travis McGee in: John D. Mac Donald, The Girl in the Plain Brown Wrapper, 1968 



Starting point for the transformation approach of Burstall and Darlington 
(1977) is a program, which is presented as a set of equations with left-hand 
sides representing program heads and right-hand sides representing calcula- 
tions. 

For example, the Fibonacci function is presented as 

1. fib(O) ^ 1 

2. fib(l) 1 

3. fib(x-|-2) fib(x-l-l) -|- fib(x). 

The following inference rules for transforming recursive equations are 
given: 

Definition: Introduce a new recursive equation whose left-hand expression is 
not an instance of the left-hand expression of any previous equation. 

We will see below, that introduction of a suitable equation in general 
cannot be performed automatically (“eureka” step). 

Instantiation: Introduce a substitution instance of an existing equation. 
Unfolding: li E ^ E' and E ^ E' are equations and there is some occur- 
rence in E' of an instance of E, replace it by the corresponding instance 
of E' , obtaining E" and add equation E F" . 

This rule corresponds to the expansion of a recursive function by replac- 
ing the function call by the function body with according substitution of 
the parameters. 

Folding: li E ^ E' and F ^ E' are equations and there is some occurrence 
in F' of an instance of E' , replace it by the corresponding instance of E, 
obtaining F” and add equation F F" . 

This is the “inverse” rule to unfolding. We will discuss folding of terms 
in detail in chapter 7. 

Abstraction: Introduce a where clause by deriving from a previous equation 
E ^ E' a, new equation 



E E \ui /El , ... , UjiJ where iu \ , . . . , rt^) — ( , ... , Fjfj . 
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Laws: Algebraic laws - such as associativity or commutativity - can be used 
to rewrite the right-hand sides of equations. 

The authors propose the following strategy for application of rules 

— Make any necessary definitions. 

— Instantiate. 

— For each instantiation unfold repeatedly. At each stage of unfolding: 

— Try to apply laws and abstractions. 

— Fold repeatedly. 

Transforming Recursive Functions. We will illustrate the approach, demon- 
strating how the fibonacci program given above can be transformed into a 
more efficient program which avoids calculating values twiceitransformation, 
recursive function 



1 . 

2 . 

3. 

4. 

5. 

6 . 



7. 



fib(O) ^ 1 
fib(l) ^ 1 

fib(x-|-2) fib(x-l-l) -I- fib(x) 
g(x) (fib(x-hl),fib(x) ) 
g(0) ^ (fib(l),fib(0) ) 
g(0) ^ (1,1 ) 

g(x-l-l) <;= (fib(x-|-2),fib(x-|-l) ) 

g(x-l-l) <J= (fib(x-|-l)-|-fib(x),fib(x-|-l) ) 

g(x-l-l) (u-|-v,u) where (u,v) = (fib(x-|-l),fib(x) 

g(x-l-l) (u-|-v,u) where (u,v) = g(x) 

fib(x-|-2) u -I- V where (u,v) = (fib(x-|-l),fib(x) 

f(x-|-2) u -I- V where (u,v) = g(x) 



(given) 
(given) 
(given) 
(definition, eureka!) 
(instantiation) 
(unfolding with 1, 2) 
(instantiate 4) 
(unfold with 3) 
) (abstract) 

(fold with 4) 
) (abstract 3) 

(fold with 4) 



The new, more efficient definition of fibonacci is: 



fib(O) ^ 1 
fib(l) ^ 1 
fib(x-|-2) u 
g(0) ^ (1,1) 

g(x-l-l) u -I- V where (u,v) 



-I- V where (u,v) 



= g(x) 
g(x). 



Abstract Programming and Data Type Change. Burstall and Darlington 
(1977) also demonstrated how vertical program transformation can be real- 
ized by transforming abstract programs, defined on high-level, abstract data 
types, into concrete programs defined on concrete data. 

Again, we illustrate the approach with an example. Given is the abstract 
data type of labeled trees: 

Abstract Data Type 

niltree S labeledTrees 

Itree : atoms x labeledTrees x labeledTrees — *■ labeledTrees. 
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That is, a labeled tree is either empty {niltree) or it is a labeled node 
(atoms) which branches into two labeled trees. 

Furthermore, a data type of binary trees might be available as basic data 
structure (e. g., in Lisp) with constructors nil and pair. 

Concrete Data Structure 

nil € binaryTrees 
atoms € binaryTrees 

pair : binaryTrees x binaryTrees binaryTrees. 

A labeled tree can be represented as binary tree by representing each node 
as a pair of a label and a binary tree. For example: 

ltree(A, niltree, ltree(B, niltree, niltree)) 

can be represented as 
pair(A, pair(nil, pair(B, pairfnil, nil)))). 

The relationship between the abstract data type and the concrete data 
structure can be expressed by the following representation function: 

R : binaryTrees labeledTrees 
R(nil) niltree 

R(pair(a,(pair(pi,p2)))) ^ ltree(a,R(pi),R(p2)). 

The inverse representation function - which in some cases could be gen- 
erated automatically ~ defines how to code the data type: 

C : labeledTrees — > binaryTrees 
C(niltree) <= nil 

C(ltree(a,ti,t2)) <J= pair(a,pair(C(ti),C(t2))). 

The user can write an abstract program, using the abstract data type. 
Typically, writing abstract programs involves less work than writing concrete 
programs because implementation specific details are omitted. Furthermore, 
abstract programs are more flexible because they can be transformed into 
different concrete realizations. 

An example for an abstract program is twist which mirrors the input tree: 

twist : labeledTrees labeledTrees 
twist(niltree) niltree 

twist(ltree(a,ti,t2)) <S= ltree(a,twist(t2),twist(ti)) 

The goal is, to automatically generate a concrete program TWIST(p) = 
C(twist(R(p))): 

1. TWIST(p) ^ C(twist(R(p))) 

2. TWIST(nil) C(twist(R(nil))) 

3. TWIST(uil) 4= uil 



(instantiate) 
(unfold C, twist, R, and evaluate) 



6.2 Deductive Approaches 119 



4. TWIST(pair(a,pair(pi,p 2 ))) C(twist(R(pair(a,pair(pi,p 2 ))))) (in- 

stantiate) 

5. TWIST(pair(a,pair(pi,p 2 ))) C(twist(ltree(a,R(pi), R(p 2 )))) (unfold 

R) 

6. TWIST(pair(a,pair(pi,p 2 ))) <J= C(ltree(a,twist(R(p 2 )),twist(R(pi)))) 
(unfold twist) 

7. TWIST(pair(a,pair(pi,p 2 ))) <t= pair(a,pair(C(twist(R(p 2 ))), 

C(twist(R(pi))))) (unfold C) 

8 . TWIST(pair(a,pair(pi,p 2 ))) - 4 = pair(a,pair(TWIST(p 2 ), 

TWIST(pi))) (fold with 1) 

with the resulting program given by equations 3 and 8. 

Characteristics of Transformational Implementation. To sum up, the ap- 
proach proposed by Burstall and Darlington (1977) has the following char- 
acteristics: 

Specification: Input is a structurally simple program whose correctness is 
obvious (or can easily be proved). 

Basic Rules: A transformation system is based on a set of rules. There are 
rules which define semantics-preserving re-formulations of the (right- 
hand side, i. e., body) program (or specification). Other rules, such as 
definition and instantiation, cannot be applied mechanical but are based 
on creativity (eureka) and insight into the problem to be solved. 

Partial Correctness: Transformations are semantics-preserving modulo ter- 
mination. 

Selection of Rules: Providing that sequence of rule application which leads 
to the desired result typically cannot be performed fully automatically. 
Burstall and Darlington presented a simple strategy for rule selection. 
This strategy can in general not be applied without guidance by the 
user. 

6. 2. 2. 2 Meta-programming: The CIP Approach. The CIP approach 
was developed during the seventies and eighties (Broy & Pepper, 1981; Bauer, 
Broy, Moller, Pepper, Wirsing, et ah, 1985; Bauer, Ehler, Horsch, Moller, 
Partsch, Paukner, & Pepper, 1987) as a system to support the formal devel- 
opment of correct and efficient programs (meta-programming) . The approach 
has the following characteristics: 

Transformation Rules: Describe mappings from programs to programs. They 
are represented as directed pairs of program schemes (a — > & together with 
an enabling condition (c) defining applicability. 

Transformational Semantics: A programming language is considered as term 
algebra. The given transformation rules define congruence relations on 
this algebra and thereby establish a notion of “equivalence of programs” 
(Ehrig & Mahr, 1985). 
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Correctness of Transformations: The correctness of a basic set of transfor- 
mation rules can be proved in the usual way - showing that a and b are 
equivalent with respect to the properties of the semantic model. Cor- 
rectness of all other rules can be shown by transformational proofs: A 
transformation rule T : a —f b is correct if it can be deduced from a set of 
already verified rules Ti, . . . T„: a S oi S ... a„ = b. In general, for 
all rules a b, it must hold, that the relation between a and b is reflexive 
and transitive. Properties of recursive programs are proved by transfor- 
mational induction: For two recursive programs p and q with bodies r[p] 
and (r[q\ every call p{x) can be transformed into a call q{x) if there is a 
transformation rule T[y] — *■ <7[y\. 

Abstract Data Types: Abstract data types are given by a signature (sorts 
and operations) and a set of equations (axioms) (see, e. g. Ehrig & Mahr, 
1985). These axioms can be considered as transformation rules which are 
applicable within the whole scope of the type. (Considering a program- 
ming language as term algebra, the programming language itself can be 
seen as abstract data type with the transformation rules as axioms). 

Assertions: For some transformation rules there might be enabling conditions 
which hold only locally. For example, for the rule 



if B then S else T fi 




S 



the enabling condition B can be given or not, depending on the current 
input. If B holds, than the conditional statement evaluates to S. Such 
local properties are expressed by assertions which can be provided by 
the programmer. Furthermore, transformation rules can be defined for 
generating or propagating assertions. 

Again, we illustrate the approach with an example, which is presented in 
detail in (Broy & Pepper, 1981). The searched for program is a realization of 
the Warshall-algorithm for calculating the transitive closure of a graph. 

The graph is given by its characteristic function 

~ function edge (x:node, y:node):bool; 

{true, if nodes x and y are directly connected). 

A path is defined as sequences of nodes where each pair of succeeding 
nodes are directly connected: 

— function ispath (s:seq(node)):bool; 

Vsi,S2‘ seq(node), x,y: node:: 
s=sioxoyos2 ^ edge{x,y). 

Starting point for program transformation is the specification of the 
searched for function: 
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~ function trans (x:node, y.node) :bool; 

3 P: seq(node):: ispath(x o p o y). 

The basic idea (eureka!) is to define a more general function, which only 
considers the first i nodes (where nodes are given as natural numbers 1 . . . n) : 

— function tc(i:nat, x:node, y.node) :hool; 

3 P: seq(node):: ispath(x o p o y) 

A V z: node:: z £ p ^ z < i. 

In a first step, it must be proved, that tc(n,x,y) = trans(x,y), where n is 
the number of all nodes in a graph: 

~ Lemma 1: tc{n,x,y) = trans{x,y) 

— tc(n,x,y) = 3 P: seq(node):: ispath(x o po y) A V z: node:: z € p ^ z < n 
(unfold) 

~ = 3 P.' seq(node):: ispath(x opoy) (simplification: z <n holds for all 
nodes z) 

— = trans (x,y) (fold). 

To obtain an executable function from the given specification, a next 
eureka-step is the introduction of recursion. This is done by formulating a 
base case (for i = 0) and a recursive case: 

~ Lemma 2: tc{0,x,y) = edge{x,y) 

— Lemma 3: tc{i + 1, x, y) = tc(i, x, y) V [tc{i, x, * + 1) A tc{i, * + 1, y)]. 

The proof of lemma 2 is analogous to that of lemma 1. The proof of 
lemma 3 involves an additional case-analysis (all nodes on a path are smaller 
or equal to node i vs. a path contains a node i -I- 1) and application of logical 
rules. 

The proved lemmas give rise to the program 

— function tc(i:nat, x:node, y.node) :hool; 
ifi = 0 

then edge(x,y) 

else tc(i,x,y) V [tc(i,x,i+l) A tc(i,i+l,y)]. 

This example demonstrates, how the development of a program from a 
non-executable initial specification can be supported by CIP. Program devel- 
opment was mainly “intuition-guided”, but the transformation system sup- 
ports the programmer in two ways: First, the notion of program transforma- 
tion results in systematic program development by stepwise refinement, and 
second, the transformation system (partially) automatically generates proofs 
for the correctness of the programmers intuitions. 

Given the recursive function tc, in a next step it can be transformed from 
its inefficient recursive realization into a more efficient version realized by for- 
loops. This transformational implementation can be performed automatically 
by means of sophisticated (and verified) transformation rules. 
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6. 2. 2. 3 Program Synthesis by Stepwise Refinement: KIDS. KIDS^ 
is currently the most successful system for deductive (transformational) pro- 
gram synthesis. KIDS supports the development of correct and efficient pro- 
grams by applying consistency-preserving transformations to an initial speci- 
fication, first transforming the specification into an executable but inefficient 
program and then optimizing this program. The basis of KIDS is Refine - a 
wide-spectrum language for representing specifications as well as programs. 
The final program is represented in Common Lisp. 

In KIDS a variety of state-of-the-art techniques are integrated into one 
system and KIDS makes use of a large, formalized amount of domain and 
programming knowledge: It offers program schemes for divide-and-conquer, 
global search (binary search, depth- first search, breadth-first search), and lo- 
cal search (hill climbing), it makes use of a library of reusable domain axioms 
(equations specifying abstract data types), it integrates deductive inference 
(with about 500 inference rules represented as directed transformation rules), 
and it includes a variety of state-of-the-art techniques for program optimiza- 
tion (such as expression simplification and finite differencing). 

KIDS is an interactive system where the user typically goes through the 
following steps of program development (Smith, 1990): 

Develop a Domain Theory: The user defines types and functions and pro- 
vides laws that allow high-level reasoning about the defined functions. 
The user can develop the theory from the scratch or make use of a hier- 
archic library of types. 

For example, a domain theory for the fc-Queens problem gives boolean 
functions representing the constraints that no two queens can be in the 
same row or column and that no two queens can be in the same diago- 
nal of a chessboard. Laws for reasoning about functions typically include 
monotonicity and distributive laws. 

Create a Specification: The user enters a specification statement in terms of 
the domain theory. 

For example, a specification of queens states that for k queens on a kx k 
board, a set of board positions must be returned for which the constraints 
defined in the domain theory hold. 

Apply a Design Tactic: The user selects a program scheme, representing a 
tactic for algorithm design, and applies it to the specification. This is 
the crucial step of program development - transforming a typically non- 
executable specification into an (inefficient) high-level program. 

For example, the fc-Queens specification can be solved with global search, 
which can be refined to depth-first search (find legal solutions using back- 
tracking) . 

Apply Optimizations: The user selects an optimization operation and a pro- 
gram expression to which it should be applied. The optimization tech- 
niques are fully automatic. 

^ see http://www.kestrel.edu/HTML/prototypes/kids.html 
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Apply Data Type Refinements: The user can select implementations for the 
high-level data types in the program. 

Which implementation of a data type will result in the most efficient 
program depends on the kind of operations to be performed. For example, 
sets could be realized as lists, arrays, or trees. 

Compile: The code is compiled into machine-executable form (first Common 
Lisp and then machine code). 

The fc-Queens example is given in detail in Smith (1990). The idea of 
program development by stepwise refinement of abstract schemes is also il- 
lustrated in Smith (1985) for constructing a divide-and-conquer based search 
algorithm.® 

The KIDS system demonstrates that it is possible that automatic pro- 
gram synthesis can cover all steps of program development from high-level 
specifications to efficient programs. Furthermore, it gives some evidence to 
the claim of KBSE that automatization can result in a higher level of produc- 
tivity in software development. High productivity is also due to the possibility 
of reuse of domain axioms. But of course, the system is not “auto-magic” - it 
strongly depends on interactions with an expert user, especially for the initial 
development steps from domain theory to selection of a suitable design tactic. 
These first steps can be seen as an approach to meta-programming, similar 
to CIP. Even for optimization, the user must have knowledge which parts 
of the program could be optimized in what way to select the appropriate 
expressions and rules to be applied to them. 

6. 2. 2. 4 Concluding Remarks. In principle, there is no fundamental dif- 
ference between constructive theorem proving and program transformation: 
In both cases, starting point is a formal specification and output is a program 
which is correct with respect to the specification. Theorem proving can be 
converted into transformation if axioms are represented as rules (equations 
may correspond to bi-directional rules) and if proof-rules (such as resolution) 
are represented as special rewrite rules. The advantages of transformation 
over proof systems for program synthesis are, that forward reasoning and a 
smaller formal overhead makes program derivation somewhat easier (Kreitz, 
1998). In contrast, proof systems are typically used for (ex-post) program 
verification. 

As described above, program transformation systems might include iron- 
verified rules representing domain knowledge which are applied in interaction 
with the system user. Such rules are mainly applied during the first steps of 
program development until an initial executable program is constructed. The 
following optimization steps (transformational programming) mainly rely on 
verified rules which can be applied fully automatically. Code optimization 
by transformation is typically dealt with in text books on compilers (Aho, 
Sethi, & Ullman, 1986). A good introduction to program transformation and 
optimization is given in Field and Harrison (1988). 

Here the system Cypress, which was the predecessor of KIDS, was used. 
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An interesting approach to “reverse” transformation - from efficient loops 
(tail recursion) to more easily verifiable (linear) recursion - is proposed by 
Giesl (2000). 



6.3 Inductive Approaches 

In the following we introduce inductive approaches to program synthesis. 
First, basic concepts of inductive inference are introduced. Inductive infer- 
ence is researched in philosophy (epistemology) , in cognitive psychology, and 
in artificial intelligence. In this section we focus on induction in AI, that is, 
on machine learning. In chapter 9 we discuss some relations between induc- 
tive program synthesis and inductive learning of strategies in human prob- 
lem solving. Grammar inference is introduced as the theoretical background 
for program synthesis. In the following sections, three different approaches 
to program synthesis are described ~ genetic programming, inductive logic 
programming, and synthesis of functional programs. Synthesis of functional 
programs is the oldest and genetic programming the newest approach. In 
genetic programming hypothesis construction relies strongest on search in 
hypotheses-space, while in functional program synthesis hypothesis construc- 
tion is most strongly guided by the structure of the examples. 

6.3.1 Foundations of Induction 

As introduced in section 6. 1.2. 2, induction means the inference of generalized 
rules from examples. The ability to perform induction is a presupposition for 
learning -- often induction and learning are used as synonyms. 

6. 3. 1.1 Basic Concepts of Inductive Learning. Inductive learning can 
be characterized in the following way (Mitchell, 1997, p. 2): 

Definition 6.3.1 (Learning). A computer program is said to learn from 
experience E with respect to some class of tasks T and performance mea- 
sure P, if its performance at tasks in T , as measured by P, improves with 
experience E. 

In the following, we will flesh-out this definition. 

Learning Tasks. Learning tasks can be roughly divided into classification 
tasks versus performance tasks, labeled as concept learning versus skill acqui- 
sition. Examples for concept learning are recognizing handwritten letters, or 
identifying dangerous substances. Examples for skill acquisition are navigat- 
ing without bumping into obstacles, or controlling a chemical process such 
that certain variables stay in a pre-defined range. 

In inductive program synthesis, the learning task is to construct a program 
which transforms input values from a given domain into the desired output 
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values. A program can be seen as representation of a concept: Each input 
value is classified according to the operations which transform it into the 
output value. Especially, programs returning boolean values, can be viewed 
as concept definitions. Typical examples are recursive definitions of odd or 
ancestor (see fig. 6.1). A program can also be seen as representation of a 
(cognitive) skill: The program represents such operations which must be per- 
formed for a given input value to fulfill a certain goal. For example, a program 
quicksort represents the skill of efficiently sorting lists; or a program hanoi 
represents the skill of solving the Tower of Hanoi puzzle with a minimum 
number of moves (see fig. 6.1). 



; functional program deciding whether a natural number is odd 
odd(x)=if (x=0) then false else if (x=l) then true else odd(x-2) 

; logical program deciding whether p is ancestor of f 

ancestor (P,F) parent(P,F). 

ancestor (P,F) parent(P,I), ancestor (I ,F) . 

; logical program for quicksort 
qsort ( [X I Xs] , Ys) 

partitionCXs ,X, SI ,S2) , 
qsort (SI , Sis) , 
qsort (S2 , S2s) , 
appendCSls, [X|S2s] ,Ys) . 
qsort ([],[]). 

; functional program for hanoi 
hanoi(n,a,b,c)=if (n=l) then move(a,b) 

else append(hanoi ( (n-1) ,a, c ,b) , move(a,b), 
hanoi ( (n-1) , c ,b, a) ) 

Fig. 6.1. Programs Represent Concepts and Skills 



Learning Experience. Learning experience typically is provided by present- 
ing a set of training examples. Examples might be pre-classified, then we 
speak of supervised learning or learning with a teacher, otherwise we speak 
of unsupervised learning or learning from observation. 

For example, for learning a single concept - such as dangerous versus 
harmless substance - the system might be provided with a number of chem- 
ical descriptions labeled as positive (i. e., dangerous) and negative (i. e., 
harmless) instances of the to be learned concept. For learning to control a 
chemical process, the learning system might be provided with current values 
of a set of control variables, labeled with the appropriate operations which 
must be performed in that case and each operation sequence constitutes a 
class to be learned. Alternatively, for skill acquisition a learning system might 
provide its own experience, performing a sequence of operations and getting 
feedback for the final outcome (reinforcement learning). In this case, the ex- 
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amples are labeled only indirectly - the system is confronted with the credit 
assignment problem, determining the degree to which each performed oper- 
ation is responsible for the final outcome. 

In the simplest form, examples might consist just of a set of attributes. 
Such attributes might be categorial (substance contains fluor yes/no) or met- 
ric (current temperature has value x). In general, examples might be struc- 
tured objects, that is sets of relations or terms. For instance, for the ancestor 
problem (see fig. 6.1) kinship between different persons constitute the learning 
experience. In program synthesis, examples are always structured objects. 

Training examples can be presented stepwise (incremental learning) or 
all at once ( “batch” learning) . Both learning modes are possible for program 
synthesis. Finally, training examples should be a representative sample of 
the concept or skill to be learned. Otherwise, the performance of the learning 
system might be only successful for a subset of possible situations. 

In table 6.2 possible training examples for the ancestor and the hanoi 
problems are given. In the first case, given some parent-child relations, posi- 
tive and negative examples for the concept “ancestor” are presented. In the 
second case, examples are associated with sequences of move operations for 
different numbers of discs. The hanoi example can be classified as input to 
supervised or unsupervised learning: The number of discs (n = i, z = 1 . . . 3) 
represents a possible input and the move operations represent the class, the 
input must be associated with (see Briesemeister et ah, 1996). Alternatively, 
the disc-number/operations pairs can be seen as patterns and the learning 
task is to extract their common structure (descriptive generalization). While 
in the first case, the different operator sequences constitute different classes 
to be learnt, in the second case, all examples are positive. The examples 
are ordered with respect to the number of discs. Therefore, in an indirect 
way, they also contain information about what cases are not belonging to the 
domain. 

Additional Information. The learning system might be provided with addi- 
tional information besides the training examples. For instance, in the ancestor 
problem in table 6.2 some parent-child relations are directly given. These facts 
are essential for learning: It is not possible to come up with a terminating 
recursive rule (as given in fig. 6.1) without reference to a direct relation as 
parent. 

If a system is given additional information besides the training examples, 
this is typically called background knowledge. Background knowledge can have 
different functions for learning: it can be necessary for hypothesis construc- 
tion, it can make hypothesis construction more efficient, or it can make the 
resulting hypothesis more compact or readable. 

The ancestor example could have been presented in a different way, as 
shown in table 6.3. Instead of giving parent relations, mother and father 
relations can be used, together with a rule stating, that father(X,Y) or 
mother (X,Y) implies parent(X,Y). Another example for using background 
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Table 6.2. Training Examples 
; ANCESTOR (X,Y) 

parent (peter ,robert) parent (mary ,robert) 
parent (robert , t im) parent (t im , anna) 
parent (tim, John) parent ( julia, John) 



; pre-classif ied positive and negative examples 
ancestor (julia, John) not (ancestor (anna, tim) ) 

ancestor (peter , tim) not (ancestor (anna, mary) ) 
ancestor (peter , anna) not (ancestor (John, robert) ) 



; HANOI(N,A,B,C) 

(N = 1) — > (move(A,B)) 

(N = 2) — > (move(A,C) ,move(A,B) ,move(C,B)) 
(N = 3) — > (move(A,B) ,move(A,C) ,move(B,C) , 
move (A, B) ,move(C,A) ,move(C,B) , 
move (A,B) ) 



knowledge for learning, is to give pre-defined rules for append and partition 
for learning quicksort (see fig. 6.1). 



Table 6.3. Background Knowledge 
; ANCESTOR (X,Y) 

father(peter, robert) mother (mary , robert ) 

father (robert ,tim) father (tim, anna) 

f ather ( t im , j ohn) mother ( j ulia , j ohn) 

parent (X,Y) mother (X,Y). 

parent(X,Y) father(X,Y). 



Additional information can also be provided in form of an oracle: The 
learning system constructs new examples which are consistent with its current 
hypothesis and asks an oracle whether the example belongs to the class to 
be learned. An oracle is usually the teacher. 

Representing Hypotheses. The task of the learning system is to construct 
generalized rules (representing concepts or skills) from the training examples 
and background knowledge. In other words, learning means to construct an 
intensional representation from a sample of an extensional representation of 
a concept or skill. As mentioned in section 6. 1.2. 2, such generalized rules 
have the status of hypotheses. 

For supervised concept learning, a hypothesis is a special case of a function 
f : X Y which maps a given object belonging to the learning domain to 
a class. In general, / can be a linear or a non-linear function, mapping a 
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vector of attribute values X into an output value F, representing the class 
the object described by this vector is supposed to belong to. Alternatively, 
hypotheses can be represented symbolically - the most prominent example 
are decision trees. 

In the context of program synthesis, inputs X are structural descriptions 
(relations or terms), outputs Y are operator sequences transforming the in- 
put into the desired output, and hypotheses are represented as logical or 
functional programs (see fig. 6.1): For the ancestor problem, function f{x) is 
represented by clauses ancestor (X,Y) which return true or false for a given 
set of parent-child relations and two persons. For the hanoi problem, func- 
tion f{x) is represented by a functional program which returns a sequence 
of more-operations to transport a tower of n discs from the start peg to the 
goal peg. 

Identification Criteria and Performance Measure. There are two stages in 
which the learning system is evaluated: First, a decision criterium is needed 
to terminate the learning algorithm, and, second, the quality of the learned 
hypothesis must be tested. An algorithm terminates, if some identification 
criterium with respect to the training examples is fulfilled. The classical con- 
cept here is identification in the limit, proposed by Gold (1967). The learner 
terminates if the current hypothesis holds at some point during learning for all 
examples. An alternative concept is PAG (probably approximately correct) 
identification (Valiant, 1984). The learner terminates if the current hypoth- 
esis can be guaranteed with a high probability to be consistent with future 
examples. 

The first model represents an all-or-nothing perspective on learning -- a 
hypothesis must be totally correct with respect to the seen examples. The 
second model is based on probability theory. Most approaches of concept 
learning are based on PAG-learnability, while program synthesis typically is 
based on Gold’s theory (see sect. 6. 3. 1.2). An analysis of program synthesis 
from traces in the PAG theory was presented by Gohen (1995, 1998). 

To evaluate the quality of the learned hypothesis, the learning system is 
presented with new examples and generalization accuracy is obtained. Typi- 
cally, the same criterium is used as for termination. 

Learning Algorithm. A learning algorithm gets training examples and back- 
ground knowledge as input and returns a hypothesis as output. 

The majority of learning algorithms is for constructing non-recursive hy- 
potheses for supervised concept learning where examples are represented as 
attribute vectors (Mitchell, 1997). Among the most prominent approaches are 
function approximation algorithms- such as perceptron, back-propagation, or 
support vector machines -, Bayesian learners, and decision tree algorithms. 
For unsupervised concept learning, the dominant approach is cluster analysis. 

Approaches for learning recursive hypotheses from structured examples 
- that is, approaches to inductive program synthesis - are genetic program- 
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ming, inductive logic programming, and synthesis of functional programs. 
These approaches will be described in detail below. 

In general, a learning algorithm is a search-algorithm in the space of 
possible hypotheses. There are two main learning mechanisms: data-driven 
approaches start with the most specific hypothesis, that is, the training ex- 
amples, and approximation- driven (or model-driven) approaches start with a 
set of approximate (incomplete, incorrect) hypotheses. Data-driven learning 
is incremental, iterating over the training examples (Flener, 1995). Each iter- 
ation returns a hypothesis which might be the searched for final hypothesis. 
Approximation-driven learning is non-incremental. Current hypotheses are 
generalized if not all positive examples are covered, or specialized if other 
than the positive examples are covered. 

Induction Biases. As usual, there is a trade-off between the expressiveness 
of the hypotheses language and the efficiency of the learning algorithm (cf., 
the discussion of domain specification languages and efficiency of planning 
algorithms, chap. 2). Typically, in machine learning search is restricted by 
so called induction biases (Flener, 1995, pp. 33): Each learning algorithm is 
restricted by a syntactic bias, that is, a restriction of the hypothesis language. 
For logical program synthesis, the language can be restricted to a subset of 
Prolog clauses. For functional program synthesis, the form of functions can be 
restricted by program schemes. Furthermore, a semantic bias might restrict 
the vocabulary available for the hypothesis. For example, when synthesizing 
Lisp programs, the language primitives might be restricted to car, cdr, cons, 
and atom (see sect. 6.3.4). 

Machine Learning Literature. The AI text books of Winston (1992) and Rus- 
sell and Norvig (1995) include chapters on machine learning techniques. An 
excellent text book on machine learning was written by (Mitchell, 1997). Col- 
lections of important research papers, including work on inductive program 
synthesis, are presented by Michalski, Carbonell, and Mitchell (1983) and 
Michalski, Carbonell, and Mitchell (1986). A collection of papers on knowl- 
edge acquisition and learning was edited by Buchanan and Wilkins (1993). 
Flener (1995) discusses machine learning in relation to program synthesis. 

6. 3. 1.2 Grammar Inference. Most of the basic concepts of machine learn- 
ing as introduced above have their origin in the work of Gold (1967). Gold 
modeled learning as the acquisition of a formal grammar from example and 
provided a classification of models of language learnability together with 
fundamental theoretical results. Grammar inference research developed from 
Gold’s work in two directions - providing theoretical results of learnability 
for certain classes of languages given a certain kind of learning model and 
recently also the development of efficient learning algorithms for some classes 
of formal grammars. We focus on the conceptual and theoretical aspect of 
grammar inference, that is on algorithmic learning theory. Learning theory 
can be seen as a special case of complexity theory where the complexity of 
languages is researched with respect to the effort of their enumeration (by 
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application of rules of a formal grammar) or with respect to the effort of their 
identification by an automaton. 

Definition 6.3.2 (Formal Language). For a given non-empty finite set 
A, called alphabet of the language, A* represents the set of all finite strings 
over elements from A. A language L is defined as: L A* . 

Learning means to assign a grammar (called naming relation by Gold) to 
a language L after seeing a sequence of example strings: 

Definition 6.3.3 (Language Learning). For a sequence of time steps, the 
learner is at each time step confronted with a unit of information it concern- 
ing the unknown language L. A stepwise presentation of units i is called 
training sequence. At each time step, the learner makes a guess gt of the 
grammar characterizing L, based on the information it has received from the 
start of learning to time step t. That is, the learner is a function G with 
9t = G{i\, . . . ,it). 

As introduced above, Gold introduced the concept of “identification in 
the limit” to define the learnability of a language: 

Definition 6.3.4 (Identification in the Limit). L is identified in the lim- 
it if after a finite number of time steps the guesses are always the same, that 
is, gt = gt+ii i > 0. A class of languages L is called identifiable in the limit if 
there exists an algorithm such that each language of the class will be identified 
in the limit for any allowable training sequence. 

Identification in the limit is dependent on the method of information pre- 
sentation. Gold discerns presentation of text and learning with an informant. 
Learning from text means learning from positive examples only, where exam- 
ples are presented in an arbitrary order. Learning with informant corresponds 
to supervised learning. The informant can present positive and negative ex- 
amples and can provide some methodical enumeration of examples. Gold 
investigates three modes of learning from text and three modes of learning 
with informant: 

— Method of information presentation: Text 

A text is a sequence of strings cci, X 2 , . . . from L such that each string of L 
occurs at least once. At time t the learner is presented Xt. 

— Arbitrary Text: Xt may be any function of t. 

— Recursive Text: Xt may be any recursive function of t. 

— Primitive Recursive Text: Xt may be any primitive recursive function of 
t. 

— Method of information presentation: Informant 

An informant for L tells the learner at each time t whether a string yt is 
element of L. There are different ways of how to choose yt 

— Arbitrary Informant: yt may be any function of t as long as every string 
of A* occurs at least once. 
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— Methodical Informant: An enumeration is assigned a priori to the strings 
of A* and yt is the t-th string of the enumeration. 

— Request Informant: At time t the learner chooses yt on the basis of 
information received so far. 

The method “request informant” is currently an active area of research 
called active learning (Thompson, Califf, & Mooney, 1999). Gold showed that 
all three methods of information presentation by informant are equivalent 
(Theorem 1.3, in Gold, 1967). Obviously, it is a harder problem to learn from 
text only than to learn from an informant, because in the second case not 
only positive but also negative examples are given. 

Identification in the limit is also dependent on the “naming relation” , that 
is, on the way, in which a hypothesis about language L, that is a grammar, 
is represented. The two possible naming relations for languages are 

— Language Generator: A Turing machine which generates L. 

A generator is a function from positive integers (corresponding to time 
steps t) to strings in A* such that the function range is exactly L. A 
generator exists iff L is recursively enumerable. 

— Language Tester: A Turing machine which is a decision procedure for L. 
A tester is a function from strings to {0, 1} with value 1 for strings in L 
and value 0 otherwise. 

Generators can be transformed in testers but not the other way round 
and it is simpler to learn a generator for a class of languages than a tester 
(Gold, 1967). 

The definition of learnability as identification in the limit together with 
a method of information presentation and a naming relation constitutes a 
language learnability model. 

Gold presented some methods for identification in the limit, the most 
prominent one is identification by enumeration. 

Definition 6.3.5 (Identification by enumeration). ® Let D be a descrip- 
tion for a class of languages where the elements of D are effectively enu- 
merable. Let L{d) be the language denoted by description d. For each time 
step t, search for the least k such that it is in L{dk) and that all preceding 
ij, j = 1 . . .t — 1 are in L(dk), too. 

Identification by enumeration presupposes a linear ordering of possible hy- 
potheses. Each incompatibility between the training examples seen so far and 
the current hypothesis results in the elimination of this hypothesis. That is, 
if hypotheses are effectively enumerable, it is guaranteed that the t-th guess 
is the earliest description compatible with the first t elements of the training 
sequence and that learning will converge to the first description of L{d) in 



The definition is oriented at the formulation of Angluin (1984) rather than on 
the original formulation of Gold (1967). 
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the given enumeration D. In practice, enumeration is mostly not efficient. 
It is often possible, to organize the class of hypotheses in a better way such 
that whole subclasses can be eliminated after some incompatibility is iden- 
tified (Angluin, 1984). The notion of inductive inference as search though a 
given space of hypotheses is helpful for theoretical considerations. In practice, 
this space typically is only given indirectly by the inference procedure and 
hypotheses are generated and tested at each inference step. 

Gold (1967) provided fundamental results about which classes of lan- 
guages are identifiable in the limit given which learnability model. These 
results are summarized in table 6.4. Later work in the area of learning the- 
ory is mostly concerned with refining Gold’s results by identifying interesting 
subclasses of the original categorization of Gold and providing analyses of 
their learnability together with efficiency results (Zeugmann & Lange, 1995; 
Sakakibara, 1997). 



Table 6.4. Fundamental Results of Language Learnability (Gold, 1967, tab. 1) 

Learnability Model Class of Languages 

Primitive Recursive Text -|- Generator Naming 

Recursively enumerable recursive 



Informant 

Primitive recursive 

Context-sensitive 

Context-free 

Regular 

Superfinite 



Text 



Finite cardinality languages 



Theorems and proofs concerning all results given in table 6.4 can be found 
in the appendix of Gold (1967). We just highlight some results, presenting 
the theorems for the class of finite cardinality languages (languages with a 
finite number of words) and for the class of superfinite languages, containing 
all languages of finite cardinality and at least one of infinite cardinality. 

Theorem 6.3.1 (Learnability of Finite Cardinality Languages). 

(Gold, 1967, theorem 1.6) Finite cardinality languages are identifiable from 
(arbitrary) text, that is, from positive examples presented in some arbitrary 
sequence, only. 

Proof (Learnability of Finite Cardinality Languages). Imagine a language 
which consists of integers from 1 to a fixed number n. For any information 
sequence i\,i 2 ,... the learner can assume that the language consists only of 
the numbers seen so far. Since L is finite, after some time all elements of L 
have occured in the information sequence, such that the guess will be correct. 
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It is enough to give the proof for a language consisting of a finite set of 
positive integers because all other finite languages can be mapped on this 
language. 

Theorem 6.3.2 (Learnability of Superfinite Languages). (Gold, 1967, 
theorem 1.9) Using information presentation by recursive text and the genera- 
tor-naming relation, any class of languages which contains all finite languages 
and at least one infinite language L is not identifiable in the limit. 

Proof (Learnability of Superfinite Languages). The infinite language L is re- 
cursively enumerable (because otherwise it would not have a generator). It 
can be shown that there exists a recursive sequence of positive integers which 
ranges over L without repetitions. If the information sequence is presented 
as recursive sequence ai, 02 , . . . then at some time step, this information se- 
quence is a recursive text for the finite language L = {a±, . . . at}. At a later 
time step t' some additional element of the infinite language can be presented 
and the information sequence now is a recursive text for the finite language 
L' = {a\, ..., at, at'} . From this observation follows, that the learning system 
must change its guess an infinite number of times. 

Primitive recursive languages are learnable from an informant because 
for such languages exists an efficient enumeration (Hopcroft & Ullman, 1980, 
chap. 7). Because learning with informant includes the presentation of neg- 
ative examples, these can be used to remove incompatible hypothesis from 
search. While primitive recursive language are identifiable in the limit, it is 
not possible - even for regular languages - to find a minimal description (a de- 
terministic finite automaton with a minimal number of states) in polynomial 
time (Gold, 1978)! 

Most of research in grammar inference is concerned with regular lan- 
guages and subclasses of them (as regular pattern languages). Angluin (1981) 
could show that a minimal deterministic finite automaton can be inferred in 
polynomial time from positive and negative examples, if additionally to an 
informant, an oracle answering membership queries is used. Angluin’s ID al- 
gorithm is used for example by Schrodl and Edelkamp (1999) for inferring 
recursive functions from user-generated traces. Bostrom (1998) showed how 
discriminating predicates can be inferred from only positive examples using 
an algorithm for inferring a regular grammar.^ 

In chapter 7 we will show that the class of recursive programs addressed 
by our approach to folding finite ( “initial” ) programs corresponds to context- 
free tree grammars. Finite programs can be seen as primitive recursive text 
and folding, that is, generating a recursive program which results in the given 
finite program as its f-th unfolding correspond to generator-naming (see first 
line in tab. 6.4). 

^ More information about grammar inference, including algorithms and areas of 
application, can be found via http://www.cs.iastate.edu/~honavar/gi/gi. 
html. 
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6.3.2 Genetic Programming 

Genetic programming as an evolutionary approach to program synthesis was 
established by Koza (1992). Starting point is a population of randomly gen- 
erated computer programs. New programs - better adapted to the “environ- 
ment” , which is represented by examples together with an “fitness” function 
- are generated by iterative application of Darwinian natural selection and 
biologically inspired “reproduction” operations.® Genetic programming is ap- 
plied in system identification, classification, control, robotics, optimization, 
game playing, and pattern recognition. 

6. 3. 2.1 Basic Concepts. 

Representation and Construction of Programs. In genetic programming, typ- 
ically /uncttonaZ programs are synthesized. Koza (1992) synthesizes Lisp func- 
tions, an application of genetic programming to the synthesis of ML func- 
tions was presented by Olsson (1995). These programs are represented as 
trees. A program is constructed as a syntactically correct expression over a 
set of predefined functions and a set of terminals. The set of functions can 
contain standard operators (e. g., arithmetic operators, boolean operators, 
conditional operators) as well as user-defined domain specific functions. For 
each function, its arity is given. The set of terminals contains variables and 
atoms. Atoms represent constants. Variables represent program arguments. 
They can be instantiated by values obtained for example from “sensors” . 

A random generation of a program typically starts with an element of 
the set of functions. The function becomes the root node in the program 
tree and it branches in the number of arcs given by its arity. For each open 
arc an element from the set of functions or terminals is introduced. Program 
generation terminates if each current node contains a terminal. Examples for 
program trees are given in figure 6.2. 

Representation of Problems. A problem together with a “fitness” measure 
constitutes the “environment” for a program. Problems are represented by a 
set of examples. For some problems, this can be pairs of possible input val- 
ues, together with their associated outputs. The searched for program must 
transform possible inputs into desired outputs. Fitness can be calculated as 
the number of erroneous outputs generated by a program. Such problems 
can be seen as typical for inductive program synthesis. For learning to play 
a game successfully, the problem can consist of a set of possible constella- 
tions together with a scoring function. The searched for program represents 
a strategy for playing this game. For each constellation, the optimal move 
must be selected. Fitness can for example be calculated as the percentage 
of wins. For learning a plan, the input consists of a set of possible problem 
states. The searched for program generates action sequences to transform a 

The genetic programming home page is http: //www. genetic-programming, 
org/. 
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(a) 







Fig. 6.2. Construction of a Simple Arithmetic Function (a) and an Even- 2-Parity 
Function (b) Represented as a Labeled Tree with Ordered Branches (Koza, 1992, 
figs. 6.1, 6.2) 



problem state into the given goal state. Fitness can be calculated as the per- 
centage of problem states for which the goal state is reached. An overview of 
a variety of problems is given in Koza (1992, tab. 2.1). 

Generational Loop. Initially, thousands of computer programs are randomly 
generated. This “initial population” corresponds to a blind, random search in 
the space of the given problem. The size of the search space is dependent on 
the size of the set of functions over which programs can be constructed. Each 
program corresponds to an “individual” whose “fitness” influences the prob- 
ability with which it will be selected to participate in the various “genetic 
operations” (described below) . Because selection is not absolutely dependent 
on fitness, but fitness just influences the probability of selection, genetic pro- 
gramming is not a purely greedy-algorithm. Each iteration of the generational 
loop results in a new “generation” of programs. After many iterations, a pro- 
gram might emerge which solves or approximately solves the given problem. 
An abstract genetic programming algorithm is given in table 6.5. 

Genetic Operations. The set of genetic operations are: 

Mutation: Delete a subtree of a program and grow a new subtree at its place 
randomly. 

This “asexual” operation is typically performed sparingly, for example 
with a probability of 1% during each generation. 

Crossover: For two programs (“parents”), in each tree a cross-over point is 
chosen randomly and the subtree rooted at the cross-over point of the 
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Table 6.5. Genetic Programming Algorithm (Koza, 1992, p. 77) 

1. Generate an initial population of random compositions of the functions and 
terminals of the problem. 

2. Iteratively perform the following sub-steps until the termination criterium has 
been satisfied: 

a) Execute each program in the population and assign it a fitness value ac- 
cording to how well it solves the problem. 

b) Select computer program(s) from the current population chosen with a 
probability based on fitness. 

Greate a new population of computer programs by applying the following 
two primary operations: 

i. Gopy program to the new population (Reproduction). 

ii. Create new programs by genetically recombining randomly chosen 
parts of two existing programs. 

3. The best so-far individual (program) is designated as result. 



first program is deleted and replaced by the subtree rooted at the cross- 
over point of the second program. 

This “sexual recombination” operation is the predominant operation in 
genetic programming and is performed with a high probability (85% to 
90 %). 

Reproduction: Copy a single individual into the next generation. 

An individuum “survives” with for example 10% probability. 

Architecture Alteration: Change the structure of a program. 

There are different structure changing operations which are applied spar- 
ingly (1% probability or below): 

Introduction of Subroutines: Create a subroutine from a part of the main 
program and create a reference between the main program and the 
new subroutine. 

Deletion of Subroutines: Delete a subroutine; thereby making the hier- 
archy of subroutines narrower or shallower. 

Subroutine Duplication: Duplicate a subroutine, give it a new name and 
randomly divide the preexisting calls of the subroutine between the 
old and the new one. (This operation preserves semantics. Later on, 
each of these subroutines might be changed, for example by muta- 
tion.) 

Argument Duplication: Duplicate an argument of a subroutine and ran- 
domly divide internal references to it. (This operation is also seman- 
tics preserving. It enlarges the dimensionality of the subroutine.) 
Argument Deletion: Delete an argument; thereby reducing the amount 
of information available to a subroutine ( “generalization” ) . 
Automatically Defined Iterations/Recursions: Introduce or delete itera- 
tions (AD Is) or recursive calls (ADRs). 

Introduction of iterations or recursive calls might result in non- 
termination. Typically, the number of iterations (or recursions) is 
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restricted for a problem. That is, for each problem, each program 
has a time-out criterium and is terminated “from outside” after a 
certain number of iterations. 

Automatically Defined Stores: Introduce or delete memory (ADSs). 

Quality criteria for programs synthesized by genetic programming are cor- 
rectness - which is defined as 100% fitness for the given examples -, efficiency 
and parsimony. The two later criteria can be additionally coded in the fitness 
measure. In the following, we give examples of program synthesis by genetic 
programming. 

6. 3. 2. 2 Iteration and Recursion. First we describe, how a plan for stack- 
ing blocks in a predefined order can be synthesized where the final program 
involves iteration of actions (Koza, 1992, chap. 18.1). This problem is similar 
to the tower problem described in section 3. 1.4. 5: An initial problem state 
consists of n blocks which can be stacked or lying on the table. Problem solv- 
ing operators are puttable(x) and put(x,y), the first operator defining how a 
block X is moved from another block on the table and the second operator 
defining how a block x can be put on top of another block y. The problem 
solving goal defines in which sequence the n blocks should be stacked into a 
single tower (see fig. 6.3). 
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Initial State A possible intermediate state Goal State 

Fig. 6.3. A Possible Initial State, an Intermediate State, and the Goal State for 
Block Stacking (Koza, 1992, figs. 18.1, 18.2) 



The set of terminal is T = {CS,TB,NN} and the set of functions is 

{MS, MT, NOT, EQ, DU} with 

CS\ A sensor that dynamically specifies the top block of the Stack. 

TB\ A sensor that dynamically specifies the block in the Stack which together 
with all blocks under it are already positioned correctly. ( “top correct 
block” ) 

NN: A sensor that dynamically specifies the block which must be stacked 
immediately on top of TB according to the goal, (“next needed block”) 

MS'. A move-to-stack operator with arity one which moves a block from the 
Table on top of the Stack. 
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MT: A move-to-table operator with arity one which moves a block from the 
top of the Stack to the Table. 

NOT. A boolean operator with arity one switching the truth value of its 
argument. 

EQ\ A boolean operator with arity two which returns true if its arguments 
are identical and false otherwise. 

DU: A user-defined iterative “do-until” operator with arity two. The expres- 
sion DU *Work* *Predicate* causes *Work* to be iteratively executed 
until *Predicate* is satisfied. 

All functions have defined outputs for all conditions: MS and MT change 
the Table and Stack as side-effect. They return true, if the operator can be 
applied successfully and nil {false) otherwise. The return value of DU is also 
a boolean value indicating whether ^Predicate* is satisfied or whether the 
DU operator timed out. 

For this example, the set of terminals and the set of functions are care- 
fully crafted. Especially the pre-defined sensors carry exactly that information 
which is relevant for solving the problem! The problem is more restricted than 
the classical blocks-world problem: While in the blocks-world, any constella- 
tion of blocks (for example, four stacks containing two blocks each and one 
with only one block) is possible (see tab. 2.3 for the growth of the number of 
states in dependence of the number of blocks), in the block-stacking problem, 
only one stack is allowed and all other blocks are lying on the table. 

For the evolution of the desired program 166 fitness cases were con- 
structed: ten cases where zero to all nine blocks in the stack were already 
ordered correctly; eight cases where there is one out of order block on top of 
the stack; and a random sampling of 148 additions cases. Fitness was mea- 
sured as the number of correctly handled cases. Koza (1992) reports three 
variants for the synthesis of a block stacking program, summarized in figure 
6.4. The first, correct program first moves all blocks on the table and than 
constructs the correct stack. This program is not very efficient because there 
are made unnecessary moves from the stack to the table for partially correct 
stacks. Over all 166 cases, this function generates 2319 moves in contrast to 
1642 necessary moves. In the next trial, efficiency was integrated into the 
fitness measure and as a consequence, a function calculating only the mini- 
mum number of moves emerged. But this function has an outer loop which is 
not necessary. By integrating parsimony into the fitness measure, the correct, 
efficient, and parsimonious function is generated. 

As an example for synthesizing a recursive program Koza (1992, chap. 18. 3) 
shows how a function calculation Fibonacci numbers can be generated from 
examples for the first twenty numbers. The problem is treated as sequence 
induction problem - for an ascending order of index positions J, the func- 
tion /(J) must be detected. As terminal set T = {J, 0,1, 2, 3} is given and 
as function set F = SRF}. Terminal J represents the input value. 
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Population Size: M = 500 
Fitness Cases: 166 



Correct Program: 

Fitness: 166 - number of correctly handled cases Termination: Generation 10 

(Eq (DU (MT CS) (NOT CS)) 

(DU (MS NN) (NOT NN) ) ) 

Correct and Efficient Program: 

Fitness: 0.75 • C + 0.25 • E with 

C = { number of correctly handled cases/166) • 100 E = f{n) as function of the 
total number of moves over all 166 cases: 

with f{n) = 100 for the analytically obtained minimal number of moves for a 
correct program (min = 1641 ); 

f(n) linearly scaled upwards for zero moves up to 1640 moves with /(O) = 0 
/(n) linearly scaled downwards for 1642 moves up to 2319 moves (obtained by 
the first correct program) and f(n) = 0 for n > 2319 
Termination: Generation 11 

(DU (EQ (DU (MT CS) (EQ CS TB)) 

(DU (MS NN) (NOT NN))) 

(NOT NN)) 

Correct, Efficient, and Parsimonious Program: 

Fitness: 0.70 • C + 0.20 • E + 0.10- (number of nodes in program tree) 
Termination: Generation 1 

(EQ (DU (MT CS) (EQ CS TB)) 

(DU (MS NN) (NOT NN) ) ) 

Fig. 6.4. Resulting Programs for the Block Stacking Problem (Koza, 1992, 
chap. 18.1) 



Function SRF is a “sequence referencing function” with (SRF K D) returns 
either the Fibonacci number for iC or a default value D. The used strategy is, 
that the Fibonacci number are calculated for iF = 0 . . . 20 in ascending order 
and each result is stored in a table. Such, SRF has access to the Fibonacci 
numbers for all K which were already calculated, returning the Fibonacci 
number for iF < J — 1 and default value D otherwise. For a population size 
of 2000 programs, in generation 22, the program given in table 6.6 emerged. 
A re-implementation in Lisp together with a simplified version is given in 
appendix CC.l. Function SRF realizes recursion in an indirect way: each 
new element in the sequence is defined recursively in terms of one or more 
previous elements. 



Table 6.6. Galculation the Fibonacci Sequence 
(+ (SRF (- J 2) 0) 

(SRF (+ (+ (- J 2) 0) (SRF (- J J) 0)) 
(SRF (SRF 3 1) 1))) 
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Other treatments of recursion in genetic programming can be found for 
example in Olsson (1995) and Yu and Clark (1998). 

6. 3. 2. 3 Evaluation of Genetic Programming. Genetic programming is 
a model-driven approach to learning (see sect. 6. 3. 1.1), that is, it starts with a 
set of programs (hypotheses), these are checked against the data and modified 
accordingly. Because the method relies on (nearly) blind search, the time 
effort for coming up with a program solving the desired task is very high. 
Program construction by generate-and-test has next to nothing in common 
with the way in which a human programmer develops program code which 
(hopefully) is guided by the given (complete or incomplete) specification. 

Furthermore, effort and success of program construction are heavily de- 
pendent on the given representation of the problem and of the pre-defined 
set of functions and terminals. For the block-stacking example given above, 
knowledge about which information in a constellation of blocks is relevant for 
choosing an action was explicitly coded in the sensors. In chapter 8, we will 
show that our own approach allows to infer which information (predicates) 
are relevant for solving a problem. 

6.3.3 Inductive Logic Programming 

Inductive logic programming (ILP) was named by Muggleton (1991) because 
this area of research combines inductive learning and logic programming. 
Most work done in this area concerns concept learning, that is, induction 
of non-recursive classification rules (see sect. 6. 3. 1.1). In principle, all ILP 
approaches can be applied to learning recursive clauses, if it is allowed that 
the name of the to be learned relation can be used when constructing the 
body which characterizes the to be learned relation. But without special 
heuristics, neither termination of the learning process nor termination of the 
learned clauses can be guaranteed. In the following, we present basic concepts 
of ILP, three kinds of learning mechanisms, and biases used in ILP. Finally, 
we refer some ILP systems which address learning of recursive clauses. An 
introduction to ILP is given for example by Lavrac and Dzeroski (1994) and 
Muggleton and De Raedt (1994), an overview of program synthesis and ILP 
is given by Flener (Flener, 1995; Flener & Yilmaz, 1999). 

In the following, we give illustrations of the basic concepts and learning 
algorithms using a simple classification problem presented, for example, in 
Lavrac and Dzeroski (1994). The to be learned concept is the relation daugh- 
ter(X,Y) (see tab. 6.7). 

6. 3. 3.1 Basic Concepts. In ILP, examples and hypotheses are represented 
as subsets of first order logic such as definite Horn clauses with some addi- 
tional restrictions (see discussion of biases below) . Definite Horn clauses have 
the form T V V ... V or in Prolog notation T ^ Li, . . . , The 
positive literal T is called head of the clause and represents the to be learned 
relation. Often, a clause is represented as a set of literals. 
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Table 6.7. Learning the daughter Relation 
Training Examples: 

= {ei = daughter {mar y, arm), C 2 = daughter{eve,tom)} 

£~ = {daughter{tom, arm), daughter {eve, arm)} 

Background Knowledge: 

B = {parent{ann, mary) ,parent{ann, tom),parent{tom, eve) , parent{tom, ian), 
female{ann), female{mary), female{eve)} 

To be learned clause: (learning methods described below) 
daughter{X,Y) <— female{X),parent{Y,X) 



Examples typically can be divided into positive and negative examples. 
In addition to the examples, background knowledge can be provided (see 
sect. 6.3.1. 1, table 6.2). The goal of induction is to obtain a hypothesis which, 
together with the background knowledge, is complete and consistent with 
respect to the examples: 

Definition 6.3.6 (Complete and Consistent Hypotheses). Given a set 
of training examples £ = £^ U £~ , with £+ as positive examples and £~ as 
negative examples, together with background knowledge B and a hypothesis 
Ti., let covers{BUTi,,£) be a function which returns all elements of £) which 
follow from BUTi. The following conditions must hold: 

Completeness: covers{BUH,£'^) =£+. 

Each positive example in £'^ is covered by hypothesis Ti together with 
the background knowledge B. In other words, the positive examples follow 
from hypothesis and background knowledge (B /\H \= £^ )■ 

Consistency: cover s{B \JTL,£~) = 0. 

No negative example in £~ is covered by hypothesis Ti together with the 
background knowledge B. In other words, no negative example follows 
from hypothesis and background knowledge (B ATi A £~ [=0/ 

In logic programming (Sterling & Shapiro, 1986), SLD-resolution can be 
used to check for each example in £ whether it is entailed by the hypothesis 
together with the background knowledge. An additional requirement is, that 
the positive examples do not already follow from the background knowledge. 
In this case, learning (construction of a hypothesis) is not necessary (“prior 
necessity” Muggleton & De Raedt, 1994). 

All possible hypotheses - that is, all definite horn clauses with the to 
be learned relation as head - which can be constructed from the examples 
and the background knowledge constitute the so called hypothesis space or 
search space for induction. For a systematic search of the hypothesis space, 
it is useful to introduce a partial order over clauses, based on 0-subsumption 
(Plotkin, 1969): 

Definition 6.3.7 (0-Subsumption). Let c and d be two program clauses. 
Clause c 9-subsumes clause c' , if there exists a substitution 0, such that c6 C 
c' . Two clauses c and d are 0-subsumption equivalent if c 9-subsumes d and 
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if d 9-subsumes c. A clause is reduced if it is not 8 -subsumption equivalent 
to any proper subset of itself. 

An example is given in figure 6.5. 



c = daughter{X,Y) ^ parent(Y,X),parent(W,V) 
d = daughter{X, Y) ^ parent{Y, X) 

c and d are 0-subsumption equivalent: 

c0 C d and d0 C c with 6 = {W ^Y,V ^ X} 

c is not reduced (d is a proper subset of c). 
d is reduced. 

Fig. 6.5. 0-Subsumption Equivalence and Reduced Clauses 



Based on 0-subsumption, we can introduce a syntactic notion of general- 
ity: If c0 C c', then c is at least as general as c', written c < c'. If c < c' holds 
and c' < c does not hold, c is a generalization of d and d is a specialization 
(refinement) of c. The relation < introduces a lattice on the set of reduced 
clauses. That is, any two clauses have a least upper bound and a greatest 
lower bound which are unique except for variable renaming (see figure 6.6). 

Based on 0-subsumption, the least general generalization of two clauses 
can be defined (Plotkin, 1969)®: 

Definition 6.3.8 (Least General Generalization). The least general 
generalization (Igg) of two reduced clauses c and d, written lgg(c,d) is the 
least upper bound of c and d in the 9-subsumption lattice. 

Note, that calculating the resolvent of two clauses is the inverse process 
to calculating an Igg: When calculating an Igg, two clauses are generalized 
by keeping their 0-substitution equivalent common subset of literals. When 
calculating the resolvent, two clauses are integrated into one, more special 
clause where variables occuring in the given clauses might be replaced by 
terms (see description of resolution in sect. 2.2.2 in chap. 2 and above in 
this chapter). With 0-subsumption as the central concept for ILP we have 
the basis for characterizing ILP learning techniques as either bottom-up con- 
struction of least general generalizations or top-down search for refinements 
(see fig. 6.6). 

6. 3. 3. 2 Basic Learning Techniques. In the following we present two tech- 
niques for bottom-up generalization - calculating relative Iggs and inverse 
resolution - and one technique for top-down search in a refinement graph. 



® We will come back to this concept applied to terms (instead of logical formulae) 
under the name of first-order anti-unification in chapter 7 and in part III. 
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Fig. 6.6. 6-Subsumption Lattice 



Relative Least General Generalization. Relative least general generalization 
is an exteirsioir of Igffi with respect to background knowledge, for example 
used in Golem (Muggleton & Feng, 1990). 

Definition 6.3.9 (Relative Least General Generalization). The rela- 
tive least general generalization (rlgg) of two clauses Ci and C 2 is their 
least general generalization lqg(c\,c-i) relative to the background knowledge 
B: rlgg{ci,C2) = lgg{ci ^ B,C2^ B). 

In the ILP literature, enriching clauses with background knowledge is 
discussed under the name of saturation (Rouveirol, 1991). 

An algorithm for calculating Iggs for clauses is, for example, given in 
Plotkin (1969). In table 6.8 we demonstrate calculating rlggs with the daugh- 
ter example (tab. 6.7). The rlgg of the two positive examples is calculated 
by determining the Igg betweeir Horir clauses where the examples are conse- 
queirces of the backgrouird knowledge. 



Table 6.8. Calculating an rlgg 

rlgg( daughter ( mary, ann ), daughter ( eve, tom )) = 
lgg(daughter(mary,ann)^ B, daughter(eve,tom) B) = 
daughter(X,Y) <— female(X), parent(Y,X) 



Inverse Resolution. Inverse resolution is a generalization technique based on 
inverting the resolution rule of deductive iirference. Inverse resolution requires 
inverse substitution: 

Definition 6.3.10 (Inverse substitution). For a logical formula W, an 
inverse substitution 9~^ of a substitution 6 is a function that maps terms in 
WO to variables, such that W09~^ = W. 

Again, we do not present the algorithm, but illustrate inverse resolu- 
tion with the daughter-ex.ample iirtroduced in table 6.7 (see fig. 6.7). We 
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restrict our positive examples to ei = daughter (mary, ann) and background 
knowledge to bi = female(mary) and 62 =parent(ann, mary). Working with 
bottom-up generalization, the initial hypothesis is given as the first positive 
example. The learning process starts, for example, with clauses ei and &2- 
Inverse resolution attempts to find a clause ci such that ci together with 62 
entails ei. If such a clause can be found, it becomes the current hypothesis 
and a further clause from the set of positive examples or the background 
knowledge is considered. The current hypothesis is attempted to be inversely 
resolved with this new clause and so on. The inverse resolution operator used 
in the example is called absorption or V operator. 



Fig. 6.7. An Inverse Linear Derivation Tree (Lavrac and Dzeroski, 1994, pp. 46) 

In general, inverse resolution involves backtracking. In systems working 
with inverse resolution, it is sometimes necessary to introduce new predicates 
in the hypothesis for constructing a complete and consistent hypothesis. The 
operator realizing such a necessary predicate invention is called W operator 
(Muggleton, 1994). Inverse resolution is for example used in Marvin (Sammut 
& Banerji, 1986). 

Search in the Refinement- Graph. Alternatively to bottom-up generalization, 
hypotheses can be constructed by top-down refinement. The step from a 
general to a more specific hypothesis is realized by a so called refinement 
operator: 

Definition 6.3.11 (Refinement Operator). Given a hypothesis language 
L (e. g., definite Horn clauses), a refinement operator p maps a clause c to 
a set of clauses p{c) = {o' \ c' & L,c < c'}. 

Typically, a refinement operator computes so called “most general speci- 
fications” under 0-subsumption by performing one of the following syntactic 
operations: 



c2 = daughter(X, Y) - — female(X), parent(X, Y) 




c1 = daughter(mary, Y) * — parent(Y,mary) 




e1 = daughter(mary,ann) 
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~ apply a substitution to c, 

— add a literal to the body of c. 

Learning as search in refinement graphs was first introduced by Shapiro 
(1983) in the interactive system MIS (Model Inference System). An outline 
of the M/S'-algorithm is given in table 6.9. 



Table 6.9. Simplified MIS- Algorithm (Lavrac and Dzeroski, 1994, pp.54) 

Initialize hypothesis to a (possibly empty) set of clauses in £.. 

REPEAT 

Read next (positive or negative) example. 

REPEAT 

IF there exists a covered negative example e 
THEN delete incorrect clauses from H. 

IF there exists a positive example e covered by Ti. 

THEN with breadth-first search of the refinement graph, develop a 
clause c which covers e and add it to 7i. 

UNTIL H is complete and consistent. 

Output: Hypothesis H. 

FOREVER. 



Again, we illustrate the algorithm with the daughter example (tab. 6.7). 
A part of the breadth-first search tree (see chap. 2) is given in figure 6.8. 
The root node of an refinement graph is formally the most general clause, 
that is false or the empty clause. Typically, a refinement algorithm starts 
with the most general definition of the goal relation, for our example, that is 
H = c = daughter(X, Y) <— true. The hypothesis covers both positive training 
examples, that is, it must not be modified if these examples are presented 
initially. If a negative example (e. g., 63 = daughter(eve, tom)) is presented, 
the hypothesis needs to be modified because c covers the negative examples, 
too. Therefore, a clause covering the positive examples but not covering 63 
must be constructed. The set of all possible (minimal) specializations of c is 
calculated as p{c) = {daughter{X,Y) <— L}. The newly introduced literal is 
either a literal which has the variables used in the clausal head as argument 
or a literal introducing a new variable. The predicate names of the literals 
can be obtained by the background knowledge, additional build-in predicates 
(such as equality or inequality) can be used. 

Already for this small example, the search space becomes rather large: L 
can be 

— X = Y, or female(X), or femalefY), or parent{X, X), or parent{X,Y), 
or parent{Y, A), or parentfY, Y), or 

— parent{X, Z), or parent{Z, A), or parent{Y, Z), or parent{Z, A), with Z ^ 
A and Z Y. 
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Note, that in this example the language is restricted such, that literals with 
the goal predicate daughter are not considered in the body of the clause and 
therefore, no recursive clauses are generated. 

The refinement d = daughter{X, Y) <— female(X) covers the positive 
examples and does not cover 63. Therefore, it becomes the new hypothesis. 
If another negative example, such as 64 = daughter(eve, arm) is presented, 
the hypothesis again must be modified, because c' covers this example, and 
so on. 




Fig. 6.8. Part of a Refinement Graph (Lavrac and Dzeroski, 1994, p. 56) 



A current system, based on top-down refinement is Foil (Quinlan, 1990). 

6. 3. 3. 3 Declarative Biases. The most general setting for learning hy- 
potheses in an ILP framework would be to allow that the hypothesis can 
be represented as an arbitrary set of definite Horn clauses, that is any legal 
Prolog program. Because ILP techniques are heavily dependent on search 
in hypothesis space, typically much more restricted hypothesis languages are 
considered. In the following, we present such language restrictions, also called 
syntactic biases (see sect. 6. 3. 1.1). Additionally, semantic biases can be in- 
troduced to restrict the to be learned relations. 

Syntactic Biases. One possibility to restrict the search space is, to provide 
second order schemes which represent the form of searched for hypotheses 
(Flener, 1995). A second order scheme is a clause with existentially quantified 
predicate variables. An example scheme and an instantiation of such a scheme 
are (Muggleton & De Raedt, 1994, p. 656): 

Scheme: 3p, q, r : p(A, Y) ^ q{X, XW), q{YW, Y),r{XW, YW) . 
Instantiation: connected(X,Y) <— partof(X,XW), partof(Y,YW), 
touches(XW, YW). 

A typical syntactic bias is to restrict search to linked clauses: 

Definition 6.3.12 (Linked Clause). A clause is linked if all of its vari- 
ables are linked. A variable V is linked in a clause c iff V occurs in the head 
of c, or there exists a literal I in c that contains variables V and W , W 
and W is linked in c. 
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The scheme given above represents a linked clause. 

Many ILP systems provide a number of parameters which can be set 
by the user. These parameters present biases and by assigning them with 
different values, so called bias shifts can be realized. 

Two examples of such parameters are the depth of terms and the level of 
terms. 

Definition 6.3.13 (Depth of a Term). The depth d{V) of a variable is 
0. The depth d(c) of a constant is 1. The depth d{f(ti, . . . ,tn)) of a term is 
1 + max{{d{ti)}) . 

Definition 6.3.14 (Level of a Term). The level t{t) of a term t in a 
linked clause c is Q if t occurs as an argument in the head of c, and 
1 + min{{l{s)}) where s and t occur as arguments in the same literal of 
c. 



In the scheme given above, X and Y have depth 1 and XW and VW 
have depth 2. For linked languages with maximal depth 1 and level > 1, the 
number of literals in the body of the clause can grow exponentially with its 
level (Muggleton & De Raedt, 1994). 

Semantic Biases. A typically semantic bias is to provide information about 
the modes and types of a to be learned clause. This information can be pro- 
vided by specifying declarations of predicates in the background knowledge. 
An example for specifying modes and types for append is given in figure 6.9: 
If the first two arguments of append are instantiated (indicated by +list), this 
predicate succeeds once (1),; if the last argument is instantiated, it can suc- 
ceed finitely many times (*). The predicate is restricted to lists of integers. It 
is obvious that such a semantic bias can restrict effort of search considerably. 



mode( 1, append( +list, +list, -list)) 
mode(*, append (-list, -list, -hlist)) 
list (nil) ^ 

list([X — TJ) <— integer(X), list(T) 

Fig. 6.9. Specifying Modes and Types for Predicates 



Another semantic bias, for example used in Foil (Quinlan, 1990) and 
Golem (Muggleton & Feng, 1990), is the notion of determinate clauses. 

Definition 6.3.15 (Determinate Clauses). A definite clause <— /i, . . . , 

In is determinate (with respect to background knowledge B and examples £) 
iff for every substitution 9 for h that unifies h to a ground instance e £ £ , and 
for all i = 1, . . .n there is a unique substitution 9i such that (Zi A . . . A li)96i 
is ground and is true for B A 
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A clause is determinate if all its literals are determinate and a literal k is 
determinate if each of its variables which do not appear in preceding literals 
Ij, j = 1, ... i — 1 has only one possible instantiation given the instantiations 
in {Ij I j = 1, . . .z — 1} . For the daughter example given in table 6.7, the 
clause daughter{X,Y) ^ female{X),parent{X,Y) is determinate: If X is 
instantiated with mary, then Y can only be instantiated with ann. 

Restriction to determinate clauses can reduce the exponential growth of 
literals in the body of a hypotheses during search. A special case of deter- 
minacy is the so called zj-determinacy. Parameter i represents the maximum 
depth of variables (def. 6.3.13), parameter j represents the maximal degree of 
dependency (def. 6.3.14) in determined clauses (Muggleton & Feng, 1990). 

Definition 6.3.16 (zj-Determinacy of Clauses). A clause consisting of 
a head only is Qj -determinate. A clause h <— l\, . . . ^lm,lm+ii ■ ■ ■ is ij- 
determinate iff 

— h ^ h, . . . is {i — l)j -determinate, and 

— every literal in Im+i,- ■ ■ ,ln contains only determinate terms and has a 
degree at most j . 

The clause daughter(X,Y) <— female{X),parent(X,Y) is 11-determi- 
nate because all variables in the body appear in the head (z = 1) and the 
maximal degree is one (j = 1) for variable Y in parent (X, Y) which depends 
on X. 

It can be shown that the restriction to zj-determinate clauses allows for 
polynomial effort of learning (Muggleton & Feng, 1990). 

6. 3. 3. 4 Learning Recursive Prolog Programs. In principle, all ILP sys- 
tems can be extended to learning recursive clauses, that is, they can be seen 
as program synthesis systems, if the language bias is relaxed such that the 
goal predicate is allowed to be used when constructing the body of a hypoth- 
esis. For the systems MIS (Shapiro, 1983), Foil (Quinlan, 1990), and Golem 
(Muggleton & Feng, 1990), for example, application to program synthesis was 
demonstrated. In the following, we first show, how learning recursive clauses 
is realized with Golem and afterwards introduce some ILP approaches which 
were proposed explicitly for program synthesis. 

Learning reverse with Golem. The system Golem (Muggleton & Feng, 1990) 
works with bottom-up generalization, calculating relative least general gen- 
eralizations (see def. 6.3.9). We give an example, how reversefX, Y) can be 
induced with Golem. In table 6.10 the necessary background knowledge, pos- 
itive and negative examples are presented. 

As described above, an rlgg is calculated between pairs of clauses of the 
form e <— B where e is a positive example and B is the conjunction of 
clauses from the background knowledge. The “trick” used for learning re- 
cursive clauses with an rlgg technique is that the positive examples are addi- 
tionally made part of the background knowledge. In this way, the name of the 
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Table 6.10. Background Knowledge and Examples for Learning reverse(X,Y) 
positive examples 

rev( [],[]). rev([l],[l]). rev( [2] , [2] ) . rev( [3] , [3] ) . rev( [4] , [4] ) . 
rev( [1 ,2] , [2, 1] ) . rev( [1 ,3] , [3, 1] ) • tev( [1 ,4] , [4, 1] ) . rev( [2,2] , [2,2] ) . 
rev( [2,3] , [3,2] ) . rev( [2,4] , [4,2] ) . rev( [0, 1 , 2] , [2 , 1 ,0] ) . 
rev([l,2,3] , [3,2,1]) . 

’/.7, negative examples 

rev([l],[]). rev( [0, 1] , [0, 1] ) . rev( [0, 1 , 2] , [2,0, 1] ) . 
app( [1] , [0] , [0, 1] ) . 

background knowledge 
!- mode(rev(+,-) ) . 

!- mode (app (+,+,-)) . 

rev([],[]). ... rev( [1 ,2,3] , [3,2, 1] ) . "/,7, all positive examples 

app( [],[],[]). app([l], [],[!]). app([2],[],[2]). app( [3] , [] , [3] ) . 

app([4],[],[4]). app([],[l],[l]). app( [] , [2] , [2] ) . app( [] , [3] , [3] ) . 

app( [] , [4] , [4] ) . app( [1] , [0] , [1,0] ) . app( [2] , [1] , [2, 1] ) . 

app( [3] , [1] , [3, 1] ) . app( [4] , [1] , [4, 1] ) . app( [2] , [2] , [2,2] ) . 

app ( [3] , [2] , [3 , 2] ) . app ( [4] , [2] , [4 , 2] ) . app ( [2 , 1] , [0] , [2 , 1 , 0] ) . 

app([3,2],[l],[3,2,l]). 



goal predicate can be introduced in the body of the hypothesis. The notion 
of ij-determinacy (see def. 6.3.16) is used to avoid exponential size explosion 
of rlggs. 

For the first examples rev([ ],[ ]) and rev([l],[l]) the rlgg is: 



rev{X,X) ^ rev{[X],[X]),rev{[2],[2]), . . . , 

rev{[X, 2], [2, X]), ... , rev{\X, X, 2], [2, X, X]), 

app{X,X,X),app{\X],[l[X]), 

app([2],[],[2]),...,«pp([3,2],[X],[3,2,X]). 

This hypothesis has a head which is still too special and a body which 
contains far too many literals. The head will become more general in the 
learning process when further positive examples are introduced. Literals from 
the body can be reduced by eliminating literals which cover negative exam- 
ples. In general, this can be a very time consuming task because all subsets 
of the literals in the body must be checked. In Muggleton and Feng (1990) a 
technique working with O(n^) is described. 

For the above example, it remains: 

rev{X,X) ^ app{X,X,Y). 

The finally resulting program is: 

rev{[A\B],[C\D]) ^ rev{B,E), 

app{E, [A],[C\D]). 

A trace generated by Golem for this example is given in appendix CC.2. 
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Note, that no base case is learned, that is, interpretation of rev would 
not terminate. Furthermore, the sequence of literals in a clause depends on 
the sequence of literals in the background knowledge. If in our example, 
the append literals would be presented before the rev literals, the resulting 
program would have a body where app (append) appears before rev. In using 
a top-down left-to-right strategy, such as Prolog, interpretation of the clause 
would fail. 

Instead of presenting the append predicate necessary for inducing reverse 
by ground instances, a predefined append function can be presented as back- 
ground knowledge. But this can make induction harder because for calculat- 
ing an rlgg the background knowledge must be grounded. That is, a recursive 
definition in the background knowledge must be instantiated and unfolded to 
some fixed depth. Because there might be many possible instantiations and 
because the “necessary” depth of unfolding is unknown, search effort can eas- 
ily explode. Without a predefined restriction of unfolding-depth, termination 
cannot be guaranteed. 

Special Purpose ILP Synthesis Systems. ILP systems explicitly designed for 
learning recursive clauses are for example TIM (Idestam-Almquist, 1995), 
Synapse (Flener, Popelinsky, & Stepankova, 1994), and Merlin (Bostrom, 
1996). The hypothesis language of Tim consists of tail recursive clauses to- 
gether with base clauses. Learning is performed again by calculating rlg(^. 
Synapse synthesizes programs which confirm to a divide-and-conquer scheme 
and it incorporates predicate invention. Merlin learns tail recursive clauses 
based on a grammar inference technique (see sect. 6.3. 1.2). For a given set 
of positive and negative examples, a deterministic finite automaton is con- 
structed which accepts all positive and no negative examples. This can for 
example be realized using the ID algorithm of Angluin (1981). The automa- 
ton is then transformed in a set of Prolog rules. Merlin also uses predicate 
invention. 

6.3.4 Inductive Functional Programming 

In contrast to the inductive approaches presented so far, the classical func- 
tional approach to inductive program synthesis is based on a two step process: 

Rewriting I/O Examples into constructive expressions: Example computa- 

tions, presented as input/output examples are transformed into traces 
and predicates. Each predicate characterizes the structure of one input 
example. Each trace calculates the corresponding output for a given input 
example. 

Recurrence Detection: Regularities are searched for in the set of predicates/ 
traces pairs. These regularities are used for generating a recursive gener- 
alization. 
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The different approaches typically use a simple strategy for the first step 
which is not discussed in detail. The critical differences between the func- 
tional synthesis algorithms is how the second step is realized. Two pioneering 
approaches are from Summers (1977) and Biermann (1978). The work of 
Summers was extended by several researchers, for example by Jouannaud 
and Kodratoff (1979), Wysotzki (1983), and Le Blanc (1994). 

In the following, we first present Summers ’approach in some detail, be- 
cause our own work is based on his approach. Afterwards we give a short 
overview over Biermann’s approach and finally we discuss some extensions of 
Summers’ work. 

6. 3.4.1 Summers’ Recurrence Relation Detection Approach. 

Basic Concepts. Input to the synthesis algorithm (called Thesys) is a set of 
input/output examples E = {ei, . . . , e*, with Cj = ij ^ Oj, j = 1^ . . . , k. The 
to be constructed output is a recursive Lisp function which generalizes over 
E. The following Lisp primitives^'' are used: 

— Atom nil: representing the empty list or the truth-value false. 

— Predicate atom(x): which is true if x is an atom. 

— Constructor cons(x, 1): which inserts an element x in front of 1. 

— Basic functions: selectors car(x) and cdr(x), which return the first ele- 
ment of a list /a list without its first element; 

and compositions of selectors, which can be noted for short as c(a:)+r, 
X € {a, d} for instance car(cdr(x)) = cadr(x). 

— McCarthy-Conditional condCbl, . . . , bn): where hi are pairs of predi- 
cates and operations Pi{x) fi{x). 

Inputs and outputs of the searched for recursive function are represented 
as single s-expressions (simple expressions): 

Definition 6.3.17 (S-Expression). Atom nil and basic functions are s- 
expressions. 

cons (si, s2) is an s-expression, if si and s2 are s-expressions. 

With Summers’ method. Lisp functions with the following underlying 
basic program scheme can be induced: 

Definition 6.3.18 (Basic Program Scheme). 

F(x) ^ (pi(x) ^ /i(x), 

. . . , 

Pk{x) /fc(x), 

T^C(F(6(x)),x)) 



10 



For better readability, we write the operator names, followed by comma- 
separated arguments in parenthesis, instead of the usual Lisp notation. That 
is, for the Lisp expression (cons x 1), we write cons(x, 1). 
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where: 

Pi, . . -Pk are predicates of the form atom{bi{x)) 
b, bi are basic functions 
fi, . . . , fk are s- expressions, 

T is the truth-value “true”, C(w,x) is a cons- expression with w occuring 
exactly once. 

This basic program scheme allows for arbitrary linear recursion, that is, 
more complex forms, as tree recursion are out of reach of this method. For 
illustration we give an example presented in Summers (1977) in figure 6.10. 
For better readability, we represent inputs and outputs as lists and not as 
s-expressions. In the following, we will describe, how induction of a recursive 
function is realized by (1) generating traces, and (2) detecting regularities 
(recurrence) in traces. 



Input/Output Examples: 

{nil nil, 

(A) - {{A)), 

(A B) {{A) (B)). 

(ABC)^ {{A) (B) (C))} 

Recursive Generalization: 

unpackix) <— {atom{x) — » nil, 

T u(x)) 

u(x) ^ (atom(cdr(x)) —> cons(x,nil), 

T — » cons{cons{car{x),nil),u{cdr{x)))) 

Fig. 6.10. Learning Function unpack from Examples 



Generating Example Traces. In the first step of synthesis, the input/output 
examples are transformed into a non-recursive program which covers exactly 
the given examples. This transformation involves: 

~ constructing a trace fj which calculates Oj for ij for all examples ej, j = 

1, . . . , /c, 

~ determining a predicate which associates each input with the trace calcu- 
lating the desired output, and 

— constructing a program with the following underlying scheme: 

F{x) ^ {pi{x) /i(x), 

. . . , 

Pk{x) fk{x)). 

A trace is a function which calculates an output for a given input. In 
Summers’ approach, such functions are always cons-expressions. That such 
a cons-expression does exist, all atoms appearing in the output, must also 
appear in the input. That is, it is not possible, to generate new symbols in 
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an output. That such a cons-expression is uniquely determined, each atom 
appearing in the output, must appear exactly once in the input. The reason 
for that restriction is, as we will see below, that the output is constructed 
from the syntactical structure of the input and therefore, the position of each 
used atom must be unique. 

The algorithm for constructing the traces is given in table 6.11.^^ An 
example is given in figure 6.11. 



Table 6.11. Constructing Traces from I/O Examples 

Let SE be the set of expressions over basic functions b, which describe sub- 
expressions sub{x) of an s-expression x, associated with that function, that is 
SE — {(6(a;), sufo(x))}. 

Let X be an example input and y be the associated output. 




b{x) 

cons{ST{x, car{y)), ST{x, cdr{y))) 



y 7 ^ nil and {b{x),y) G SE 
otherwise 



Sub-expressions oi i = {A B) can be described by 

SE = {(i, {A B)), (car{i),A), (cdr{i), (B)), {cadr{i), B), {cddr{i),nil)} 

For the given examples, the traces constructed with ST are: 

nil — » nil'. nil 

(A) — > ((A)): cons(x, nil) 

(A B) — > ((A) (B)): cons(cons(car(x), nil), cons(cdr(x), nil)) 

[A B C) ^ ((A) (B) (C)): cons(cons(car(x), nil), cons(cons(cadr(x),nil), 

cons(cddr(x) ,nil) ) ) 



Fig. 6.11. Traces for the unpack Example 



In the next step, predicates must be determined, which characterize the 
example inputs in such a way that each input can be associated with its 
correct trace: For the set of predicates must hold that Pi{xi), i = 1, . . . ,fc 
evaluate to T (true) and that predicates pj{xi), 1 < j < * evaluats to nil. 
Because the only predicate used is atom, the inputs are discriminated by 
their structure only. For catching characteristics of the atoms themselves, 
“semantic” predicates, such as equal or greater would be needed. 

The searched for predicates have the form atom(bi, x) where bi is a basic 
function. The algorithm for determining the predicates must construct such 
basic-functions that, if Xi < x^+i, bi{xi) is an atom, but bi{xi+i) is not. To 

ST stands for “semi-trace”. A semi-trace is a trace where the sequence of eval- 
uation of expressions is not fixed. 
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decide whether an input is smaller than an other input, we need an ordering 
relation over possible inputs (that is over s-expressions) . In a first step, we 
reduce the inputs to their structure, by replacing all atoms by identical atoms 
oj: 



Definition 6.3.19 (Form of an S-Expression). 



form{x) 



OJ atom(x) = T 

cons{f orm{car{x)) , f orm{cdr{x))) otherwise. 



An example is given in table 6.12. 



Table 6.12. Calculating the Form of an S-Expression 

List {A {A B) C) is represented by 
S-Expression (A.{B.{C.nil)).{D.nil)). 
form{{A.{B.{C.nil)).(D.nil))) = {u).{oj.{uj.uj)).{tjj.tjj)) 



Now we can define an order relation over the set of forms s-expressions: 



Definition 6.3.20 (Complete Partial Order over S-Expressions). 

The complete partial order over SF (forms of s-expressions) is recursively 
defined as: 

Vs G SF : OJ < s 

Va, b,c,dG SF : (a.b) < (c.d) {a < c) A {b < d). 

The complete partial order over s-expressions S is: 

Vs,t G S : s <t ^ form{s) < formft). 

Note, that the order is partial, because not each s-expression can be com- 
pared with each other. For example, it holds neither that (A (B C)) < ((A B) 
C) nor that ((A B) C) < (A (B C)). For constructing predicates, it must hold 
that the example inputs can be totally ordered. For the synthesized function 
it is assumed that its hypothetical (infinite) input domain is totally ordered, 
too. 

The algorithm for constructing predicates is defined as follows: 

Definition 6.3.21 (Constructing Predicates). PredGen{xi,Xi+i) = 
PG(xi, Xi+i, I), where I is the identity function 



PG{x,y,0) 



0 atom{y) 

{9} atom{x) and ^atom{y) 

PG{car{x), car{y), car o 9)U 
PG{cdr{x),cdr(y), cdr o 9) otherwise. 



For example, PredGen((A B), (ABC)) returns {cddr) and the resulting 
predicate is atom(cddr(x)). 

The resulting non-recursive function which covers the set of input exam- 
ples is given in figure 6.12. 
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Fl{x) <— {atom{x) — > nil, 

atom{cdr{x)) —> cons(x,nil), 

atom{cddr{x)) — » cons {cons {car (x), nil), cons (cdr{x), nil)), 

T cons{cons{car{x),nil),cons{cons{cadr{x),nil),cons{cddr{x), 
nil)))) 

Fig. 6.12. Result of the First Synthesis Step for unpack 



Recurrence Detection. The second step of synthesis is to generalize over the 
given set of examples. This can be done with a purely syntactical approach 
- by pattern matching: The finite program composed from traces is checked 
for regularities between pairs of traces. This is done by determining differ- 
ences between fi{x) and fi+c{x) and checking whether recurrence relation 
fi+c = C{fi{b{x)),x) holds for all examples. The differences and the result- 
ing recurrence relation for the unpack example is given in figure 6.13. 



Differences: 

f2{x) = cons{x,fi{x)) 

h{x) = cons{cons{car{x),nil), f2{cdr{x))) 

fii{x) = cons{cons{car{x),nil), fs{cdr{x))) 

P2{x) = pi{cdr{x)) 
ps{x) = p2{cdr{x)) 

Pa{x) = ps{cdr{x)) 

Recurrence Relation: 

/i {x) = nil 

f2{x) = cons{x,fi{x)) 

fk+i{x) = cons{cons{car{x),nil), fk{cdr{x))) for fc = 2,3 
Pi{x) = atom{x) 

Pk+i{x) = pk{cdr{x)) for fc = 1, 2, 3 
Fig. 6.13. Recurrence Relation for unpack 



In Summers (1977) no algorithm for calculating the regularities is pre- 
sented, except that he mentions that the algorithm is similar to unification 
algorithms. A detailed description of a method for recurrence detection in 
final program terms is presented in chapter 7. 

From a recurrence relation which holds for a set of examples for indices 
k = s, . . . , f where s and / are constants, it is inductively generalized, that 
this relation holds for the complete domain k = s, . . . ,n with n —>■ oo. Sum- 
mers presented a theoretical foundation for this generalization, in form of a 
set of synthesis theorems. In the following, we present the central aspects of 
Summers’ theory. 

Synthesis Theorems. 

Definition 6.3.22 (Approximation of a Function). The k-th approxi- 
mation Fk{x) of a function F(x) is defined as 
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Fk{x) ^ (pi(x) ^ /i(x), 

Pk-i{x) fk-i{x), 

n) 



where fl is undefined. 

That is, the finite function F{x) constructed over a set of examples, which 
is undefined for all inputs with pk-i(x) = nil for all fc > / where / is a 
constant, can be seen as the fc-th approximation of the to be induced function 
F(x). 

Definition 6.3.23 (Recurrence Relation). If there exists an initial point 
j and an interval n such that 1 < j < k in an k-th approximation Fk{x) 
defined by a set of examples such that the following fragment differences exist: 

fj+n{x)= C{fj{bi{x)),x), 
fj+n+i{x)= C{fj+i{bi{x)),x), 

fk-l(x) = C{fk-n-l{bi{x)),x), 

and such that the following predicate differences exist: 

Pj+n{.x) = Pj{b 2 {x)), 

Pj+n+l{x)= Pj+l{b2{x)),, 

Pk-l{x) = Pk-n-l{b2{x)), 

then we define the functional recurrence relation for the examples to be 
fi{x),..., fffx),...Jj+n-iix),fi+n{x) = C(/* (&i (x)) , x) for j < i < k - 
n — 1, and we define the predicate recurrence relation for the examples to be 
pi(x), . . . ,Pj(x), . . . ,pj+n-iix),pi+n{x) = Pi{b2{x)) forj<i<k-n~l 

The index j gives the first position in the set of ordered traces where the 
recurrence relation holds. All traces with smaller indices are kept as explicit 
predicate/function pairs. For the unpack example holds j = 2. The index n 
gives a fixed interval for the recurrence covering the fact that, in general, 
recurrence can occur not only between immediately succeeding traces. For 
the unpack example holds n = 1. 

Definition 6.3.24 (Induction). If a functional and a predicate recurrence 
relation exist for a set of examples such that k — j > 2n, then we inductively 
infer that these relationships hold for all i > j ■ 

The condition k — j > 2n ensures that at least one regularity (between 
two traces) can be found in the example. We will see in chapter 7, that when 
considering more general recursive schemes, at least four traces from which 
regularities can be extracted are necessary. 
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Applying the recurrence relation induced from the examples, the set of ex- 
amples can be inductively extended to further approximations of the searched 
for function F. The m-th approximation, with m > j, has the form: 

Fm{x) ^ {pi{x) /i(x), 

Pk{x) fk{x), 

Pk+i{x) fk+i{x), 

Pm{x) ^ /ni(x), 

f2). 

Lemma 6.3.1 (Order of Approximations). The set of approximating 
functions, if it exists, is a chain with partial order <p, where F{x) <p G{x) 
holds if it is true that, for all x for which F(x) is defined, F(x) = G{x). 

Proof (Order of Approximations). We define Dm = {x \ Pi{x)} U ... U 
{x I Pm(x)} to be the domain of the approximating function Fm{x). From 
this it is clear that Fi{x) <p for all i, since Di is a subset of Ui+i 

(Summers, 1977, p. 167). 

Now, the searched for function F(x), which was specified by examples, 
can be defined as supremum of a chain of approximations: 

Definition 6.3.25 (Supremum of a Chain of Approximations). If a 

set of examples defines the chain ({Fm(x)}, the searched for function 
F(x), defined by the examples, is sup{Fm(x)} or the limit of Fm{x) for 
m — >■ oo. 

The concepts introduced, correspond to the concepts used in the fixpoint 
theory of semantics of recursive functions (see appendix BB.l and sect. 7.2 
in chap. 7). Summers’ basic synthesis theorem represents the converse of 
this problem, that is, to find a recursive program from a recurrence relation 
characterization of a partial function. In other words, the synthesis problem 
is to induce a folding from a set of traces considered as unfolding of some 
unknown recursive function. 

Theorem 6.3.3 (Basic Synthesis). If a set of examples defines F(x) with 
recurrence relations /i(x), . . . ,fn{x), fi+n{x) = G{fi{b{x)),x), 

Pi(x), . . . ,pn{x),pi+n{x) = Pi{b{x)) for i > I, then F(x) is equivalent to the 
following recursive program: 

F{x) ^ (pi(x) ^ /i(x), 

Pn{x) /n(x), 

T^C(F(6(x)),x)). 



Proof: see Summers (1977, pp. 168). 
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This basic synthesis theorem provides the foundations of synthesis of re- 
cursive programs from a finite set of examples. But it is based on constraints 
which only allow for very restricted induction. The first restriction is, that 
recurrence relationships must hold for all traces - not allowing a finite set 
of special cases which can be dealt with separately. The second restriction 
is, that the basic function b used in the predicate and functional recurrences 
must be identical. Summers (1977) presents extensions of the basic theorem 
which overcome these restrictions. 

Summers points out that it would be desirable to give a formal charac- 
terization of the class of recursive functions which can be synthesized using 
this approach, but that such a formal characterization does not exist. In later 
work, Le Blanc (1994) tries to give such a characterization. In our own work, 
we give a structural characterization of the class of recursive programs which 
can be induced from finite programs (see chapter 7). A tight characterization 
of the class of (semantic) functions which can be synthesized from traces is 
still missing! 

Variable Addition. Summers (1977) presents an extension of his approach 
for cases, where predicate recurrence relations but no functional recurrence 
relations can be detected in the traces. This problem was later also discussed 
by Jouannaud and Kodratoff (1979) and Wysotzki (1983). Neither in Sum- 
mers’ nor in later work, it is pointed out, that this problem occurs exactly in 
such cases when the underlying recurrence relation corresponds to a simple 
loop, that is a tail recursion. All authors propose solutions based on the in- 
troduction of additional variables. This is a standard method in transforming 
linear recursion into tail recursion in program optimization (Field & Harrison, 
1988). 

We illustrate this problem with the reverse function given in figure 6.14. 
For the initial traces, all differences have different forms, that is, no recurrence 
relation can be derived from the examples. The following heuristics can be 
used: Search for a subexpression a, which is identical in all traces. Rewrite 
the fragments to gi{x,a) = fi{a), abstract to g{x,y), and try again to find 
recurrence relations. 

For the reverse example, it holds that a = nil. After abstraction, for each 
pair of traces two differences exist. One of these differences is regular and can 
be transformed into a recurrence relation. For the gi{x, y), the recursive func- 
tion G{x,y) can be induced and the resulting program is: F{x) = G(x,a). 
Variable y plays the role of a collector, where the resulting output is con- 
structed, starting with the “neutral element” with respect to cons, which is 
nil. In the terminating case, the current value of y is returned. 

6. 3. 4. 2 Extensions of Summers’ Approach. The most well-known ex- 
tensions of Summers’ approach were presented by Jouannaud, Kodratoff and 
colleagues (Kodratoff & J.Fargues, 1978; Jouannaud & Kodratoff, 1979; Ko- 
dratoff, Franova, & Partridge, 1989). In the work of this group, the recursive 
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Examples: 

{nil — > nil, 

(A) - (A), 

B) ^ {B A), 

{AB C)^{C B A)} 

Traces: 
fi{x) = nil 

f2{x) = cons{car{x),nil) 

faix) = cons{cadr(x),cons{car{x),nil)) 

fi{x) = cons (caddr{x), cons (cadr{x), cons {car {x), nil))) 

Differences: 

f2{x) = cons(car(x), fi(x)) 
fsix) = cons{cadr{x), f2{x)) 
fii{x) = cons{caddr{x), fs{x)) 

Variable Introduction: 
gi{x,y) = y 

92{x,y) = cons{car{x),y) 

gs{x,y) = cons (cadr{x), cons (car (x),y)) 

g4{x,y) = cons{caddr{x),cons{cadr{x),cons{car{x),y))) 

Pairs of Differences: (a = anything) 

92{x,y) = {cons{car{x), gi{x,y)), gi{a,cons{car{x),y))} 
g3{x,y) = {cons{cadr{x),g2{x,y)),g2{cdr{x),cons{car{x),y))} 
g4{x,y) = {cons{caddr{x),g3{x,y)),g3{cdr{x),cons{car{x),y))} 
Recurrence Relation: 
fi{x) = gi{x,nil), i > 1 
gi{x,y) = y 

9k+i{x,y) = gk{cdr{x),cons{car(x),y)) 

Recursive Program: 

reverse{x) <— G{x,nil) 

G{x,y) ^ {atom{x) —> y, 

T — > G{cdr{x),cons{car{x),y))) 



Fig. 6.14. Traces for the reverse Problem 



programs which can be synthesized from traces correspond to a more complex 
program scheme than that underlying Summers’: 

Definition 6.3.26 (Extension of Summers’ Program Scheme). 



F{x) ^ f{x,i{x)) 



f{x,z) ^ 




fi{x,z) 




. . . , 

Pk{x) 


fk{x,z) 




T 


h{x,f{b{x), 



where: 

b is a basic function, 

i{x) is an initialization function, 

h, g are programs satisfying the scheme. 
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The greater complexity of to be synthesized programs is mainly obtained 
by the matching algorithm introduced by this group, called BMWk (for 
“Boyer, Moore, Wegbreit, Kodratoff”) which allows to determine how many 
variables must be introduced in the to be constructed function to obtain reg- 
ular differences between traces. An overview of this work is given in Smith 
(1984). 

Another extension was proposed by Wysotzki (1983). He reformulated 
the second step of program synthesis, that is, recurrence detection in finite 
programs, on a more abstract level: The to be synthesized programs are 
characterized by term algebras instead of Lisp functions. Our work is based on 
this extension and presented in detail in chapter 7. An additional contribution 
of Wysotzki was to demonstrate that induction of recursive programs can 
be applied to planning problems (such as the clearblock problem presented 
in sect. 3. 1.4.1 in chap. 3). This new area of application becomes possible 
because, when representing programs over term algebras, the choice of the 
interpreter function is free. Therefore, function symbols might be interpreted 
by operations defined in a planning domain (such as put or puttable, see 
chapter 2). Our work on learning recursive control rules for planning (see 
chapter 8) is based on Wysotzki’s ideas. Later on Wysotzki (1987) presented 
an approach to realize the first step of program synthesis, that is, constructing 
finite programs, using universal planning. Extensions of this ideas are also 
presented in chapter 8. 

6. 3. 4. 3 Biermann’s Function Merging Approach. An approach which 
realizes the second step of synthesis differently from Summers’ approach was 
proposed by Biermann (1978). While Summers-like synthesis is based on 
detecting regularities in differences between pairs of traces, Biermann works 
with a single trace. A trace is represented by a set of non-recursive function 
calls and regularities are searched in differences between these functions. The 
basic Lisp operations used for generating traces from input/output examples 
are the same as in Summers’ approach. The program scheme for the to be 
synthesized functions, called semi-regular Lisp scheme, is: 

Definition 6.3.27 (Semi-Regular Lisp-Scheme). 

F,{x) ^ iPi{x) fi{x), 

. . . , 

Pk{x) fk{x), 

T^fk+i{x)) 



where: 

Pi, . ■ .pk are predicates of the form atom{bi{x)) , 
bi is a basic function, 

fi,...,fk have one of the following forms: nil, x, Fj{car{x)), Fj{cdr{x)), 
cons{Fg{x),Fh{x)), 

and it holds: pj{x) = atom{bj+i{x)) => bj+i = bjow with w as basic function. 
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A set of functions corresponding to such schemes, is a program, called 
regular Lisp program. 

In the first step, a semi-trace is calculated from the given example, using 
the algorithm ST (see tab. 6.11). This semi-trace is then transformed in a 
trace by assigning a function definition to each sub-expression. An example is 
given in table 6.13 (see also Smith, 1984; Flener, 1995). Note, that the given 
input/output example represents nested cons-pairs. A cons-pair has the form 
form((a.b)) = (to u>). 



Table 6.13. Constructing a Regular Lisp Program by Function Merging 
Example: ((a.6).(c.d)) ^ ((d.c).(b.a)) 

Resulting Semi-Trace: cons(cons(cddr(x), cadr(x)), cons(cdar(x), caar(x))) 
Trace: 

fi(x) = cons(f 2 (x),f 3 (x)) 

h(x) = fA{cdr{x)) 

fsix) = fslcarlx)) 

f4{x) = cons{fe{x), frix)) 

fsix) = cons{fs{x),h{x)) 

h{x) = fw{cdr{x)) 

h(x) = fii(car(x)) 

fs{x) = fi 2 {cdr{x)) 

fsix) = fi3{car{x)) 

fio{x) = fii{x) = fi 2 (x) = fi3(x) = X 

Resulting Program: 

gi(a:) <— (atom{x) ^ X, 

T cons{g 2 {x),g 3 {x))) 

92 {x) ^ gi{cdr{xj) 

93{x) ^ gi{car{x)) 



A regular Lisp program can be represented as directed graph where each 
node represents a function. A directed arc represents a part of a conditional 
expression contained in the start node. Such a conditional expression is a pair 
of predicate and cons-expression. The goal nodes of an arc are function calls 
appearing within the cons-expression. An example is given in the last line (e) 
in figure 6.15. 

The construction of the graph which represents the searched for program 
is realized by search with backtracking. The given trace is processed top-down 
for the given function calls fi and for each function call it is checked whether 
it can be merged (is identical) with an already existing node in the graph. 
Merging results in cycles in the graph, representing recursion. If no merge is 
possible, a new arc is introduced, representing a new conditional expression. 

We demonstrate this method for the trace given in table 6.13 in figure 6.15: 
Initially, a node gi is created, representing function /i. Function /i calls two, 
possibly different functions, that is, two arcs, representing two conditional 
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a; Node for function f1 








d: Merging f4 and fS with g1 




Fig. 6.15. Synthesis of a Regular Lisp Program 



expressions, are introduced. Function call /2 could be either identical withgi 
(case b.l) or not (case b. 2 ). The first hypothesis fails and case b. 2 \s selected. 
Function call /a could be either identical to g\ (fails) or to 52 (fails) or 
represent a new case. Because it cannot be merged with fi or /2, a new node 
is introduced (case c. 3 ). Function calls /4 and /a can be merged with /i and 
therefore can be represented by node gi (case d). As a consequence, /g and 
/s are identified with g2 and /y and fg with g^. Identification for fig to /13 
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fails and a new arc is introduced. The resulting graph represents the final 
recursive program given in table 6.13. 

In contrast to Summers’ approach, function merging is not restricted to 
linear recursion. A drawback is, that Biermann’s method corresponds to an 
identification by enumeration approach (see sect. 6. 3. 1.2) and therefore is not 
very efficient. 

6. 3.4.4 Generalization to N and Programming by Demonstration. 

A subfield of inductive program synthesis is to only consider how traces or 
protocols can be generalized to recursive programs (Bauer, 1975). Such ap- 
proaches only address the second step of synthesis - detecting recurrence re- 
lation in traces. This area of research is called programming by demonstration 
(Cypher, 1993; Schrodl & Edelkamp, 1999) or generalization-to-n (Shavlik, 
1990; Cohen, 1992). As described for Summers’ recurrence detection method 
above, these approaches are based on the idea that traces can be considered 
as the fe-th unfolding of an unknown recursive program. That is, in contrast 
to the folding technique in program transformation (see sect. 6.2.2. 1), in in- 
ductive synthesis folding cannot rely on an already given recursive function! 

Programming by Demonstration with Tinker. Programming by demonstra- 
tion is for example used for tutoring of beginners in Lisp programming. The 
system Tinker (Lieberman, 1993) can create Lisp programs from user in- 
teractions with a graphical interface. For example, the user can manipulate 
blocks- worlds with operations put and puttable (see chapter 2) and the system 
creates Lisp code realizing the operation sequences performed by the user. 
In the most simple case, generalization is performed by replacing constants 
by variables. Tinker can induce simple linear recursive functions, such as the 
clearblock function (see sect. 3. 1.4.1 in chap. 3) from multiple examples (see 
fig. 6.16). Conditional expressions are constructed by asking the user about 
differences between examples. 

Generalizing Traces using Grammar Inference. A recent approach to pro- 
gramming by demonstration was presented by Schrodl and Edelkamp (1999). 
The authors describe how the grammar inference algorithm ID (Angluin, 
1981) can be applied to induce control structures from traces. As an ex- 
ample they demonstrate how the loops for bubble-sort can be learned from 
user traces for sorting lists with a small number of elements using the swap- 
operation. The trace is enriched by built-in knowledge about conditional 
branches. A deterministic finite automaton (DFA) is learned from the ex- 
ample traces and membership queries to the user. The automaton accepts 
all prefixes of traces (that is, truncations of the original traces) of the target 
program. 

Explanation-Based Generalization-to-n. An approach related to program 
synthesis from examples is explanation based learning or generalization 
(EBG) (Mitchell et ah, 1986). A (possibly recursive) concept description is 
learned from a small set of examples based on (a large amount) of background 
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Fig. 6.16. Recursion Formation with Tinker 



knowledge. The background knowledge (domain theory) is used to generate 
an “explanation” why a given example belongs to the searched for concept. 
EBG is characterized as analytical learning: The constructed hypotheses do 
not extend the deductive closure of the domain theory. Typically, Prolog is 
used as hypothesis language (Mitchell, 1997, chap. 11). EBG-techniques are 
for example used in the planning system Prodigy for learning search control 
rules (see sect. 2.5.2 in chap. 2). 

An extension of EBG to generalization-to-n was presented by Shavlik 
(1990). In a first step, explanations are generated and in a second step it is 
tried to detect repeatedly occuring sub-structures in an explanation. If such 
sub-structures are found, the explanation is transformed into a recursive rule. 



6.4 Final Comments 

6.4.1 Inductive versus Deductive Synthesis 

As we have seen in this section, both deductive and inductive approaches 
heavily depend on search. In the theorem proving approaches, it is searched 
for such a sequence of application of programming axioms that an axiom 
defining the searched for input/output relation becomes true. In the pro- 
gram transformation approaches, it is searched for a sequence of transforma- 
tion rules which, applied to a given input specification, results in an efficiently 
executable program. In genetic programming, it is searched for a syntacti- 
cally correct program which works correctly for all presented examples. In 
inductive logic programming, it is searched for a logical program which covers 
all positive and none of the negative examples of the to be induced relation. 
In inductive functional program synthesis, it is searched for regularities in 
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traces which allow to interpret the traces as fc-th approximation of a recur- 
sive function. 

For all approaches holds, that in general search is inefficient, and there- 
fore must be restricted by knowledge. One possibility to restrict search in 
deductive as well as inductive approaches is to provide program schemes: For 
example, the program transformation system KIDS, relies on sophisticated 
program schemes for divide- and- conquer, local and global search. In inductive 
logic programming, a syntactic bias is introduced by restricting the class of 
clauses which can be induced. In inductive functional program synthesis, the 
class of to be inferred functions is characterized by schemes. 

We have seen for transformational synthesis that the step of transforming 
a non-executable specification into an (inefficient) executable program cannot 
be performed automatical. Instead, this step depends heavily on interaction 
with a user. A similar bottleneck is the first step in inductive functional 
program synthesis - transforming input/output examples into traces. In the 
systems described in this section, this problem was reduced by allowing only 
lists (and not numbers) as input and by considering only the structure but 
not the content of the input examples. If these constraints are relaxed, there 
is no longer a unique trace for characterizing the transformation of an input 
example in the desired output but potentially infinitely many and it depends 
on the generated traces whether a recurrence relation can be found or not. In 
contrast, the second step of synthesis - identifying regularities in differences 
between traces - can be performed straight-forward by pattern-matching. 
The same is true for program transformation - there are several approaches 
to automatic optimization of an executable program. 

6.4.2 Inductive Functional versus Logic Programming 

There exists no systematic comparison of inductive logic and inductive func- 
tional programming in literature. A presupposition would be to provide a 
formal characterization of the class of programs which can be inferred within 
each approach. In inductive functional programming, there exist some pre- 
liminary proposals for such a characterization (Le Blanc, 1994). Possibly, the 
theoretical framework of grammar inference could be used to describe what 
class of recursive programs can be induced on the basis of what information. 
Currently, we can offer only some informal observations to compare inductive 
logic and functional programming: 

ILP typically depends on positive and negative examples, where negative 
examples are used to remove literals from the to be learned clauses. Inductive 
functional programming uses positive examples only, but these examples must 
represent the first k elements of a totally ordered input domain. Because of 
this restriction, in inductive functional programming much less examples are 
needed - ranging from one to about four - than in ILP, where typically more 
than ten positive examples must be presented. 
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In inductive functional programming, the resulting recursive function is 
independent of the sequence in which the input/output examples are pre- 
sented, while in ILP different programs can result for different example se- 
quences. The reason is, that inductive functional programming bases the con- 
struction of predicates which discriminate between examples on knowledge 
about the order of the input domain, while ILP stepwise includes positive 
examples to construct a hypothesis. 

Inductive functional programming works in a two step process - first 
transforming input/output examples into traces and then identifying regu- 
larities in traces. This corresponds to a bottom-up, data-driven approach 
to learning. ILP, regardless whether it works with top-down refinement or 
bottom-up generalization, incrementally modifies an initial hypothesis by 
adding or deleting literals. 

In inductive functional programming, the resulting program is executable 
while this must not be true for ILP: typically neither a base case for a re- 
cursive clause is induced, nor is a order for the literals in the body provided. 
It is a fundamental difference between logic and functional programs that 
for logic programs control flow is provided by the interpreter strategy (e. g., 
SLD-resolution) , while functional programs implement a fixed control struc- 
ture. Furthermore, functional programs typically have one parameter less 
than logic programs. In the first case, a function transforms a given input 
into an output value. In the second case, an input/output relation is proved 
and it is possible to either construct and output such that the relation holds 
for the given input or to construct an input such that the relation holds for 
the desired output. 

Because functions are a subset of relations, namely relations which are 
defined for all inputs and have a unique output, it can be conjectured that in 
general, induction of functions is a harder problem than induction of relations. 
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”It fits into the pattern, I think. ” ”Ah, ” said Mr. Rattisbon who knew Alleyn. 
” The pattern. Your pet theory, Chief Inspector. ” ” Yes, Sir, my pet theory. I 
hope you may provide me with another lozenge in the pattern. ” 

— Ngaio Marsh, Death of a Peer, 1940 



In this chapter we present our approach to folding finite program terms in 
recursive programs. That is, we address inductive program synthesis from 
traces. This problem is researched in inductive synthesis of functional pro- 
grams - as second step of synthesis - and in programming by demonstration 
(see chap. 6). Traces can be provided to the system by the system user, they 
can be recorded from interactions of a user with a program, or they can be 
constructed over a set of input/output examples. In the next chapter (chap. 8) 
we will describe how finite programs can be generated by planning. Folding 
of finite programs into recursive programs is a complex problem itself. Our 
approach allows to fold recursive programs which can be characterized by 
sets of context-free term rewrite rules. It allows to infer programs consisting 
of a set of recursive equations, and to deal with interdependent and hidden 
parameters. Traces can be generic or over instantiated parameters and they 
can contain incomplete paths. That is, at least for the second step of program 
synthesis, our approach is more powerful than other approaches discussed in 
literature (see sect. 6.3 in chap. 6). 

In the following we will first introduce terms and recursive program 
schemes as background for our approach (sect. 7.1), then we will formulate 
the synthesis problem (sect. 7.2), afterwards we present theoretical results 
and algorithms (sect. 7.3), and finally, we give some examples (sect. 7.4).^ 

In appendix AA.4 we give a short overview of folding algorithms we re- 
alized over the last years and in appendix AA.5 we give some details about 
the current implementation which is based on the approach presented in this 
chapter. 



^ This chapter is based on the previous publications Schmid and Wysotzki (1998) 
and Miihlpfordt and Schmid (1998) , and the diploma theses of Miihlpfordt (2000) 
and Pisch (2000). Formalization, proof, and implementation are mainly based on 
the thesis of Miihlpfordt (2000). 



U. Schmid: Inductive Synthesi.s of Functional Programs, LNAI 2654, pp. 167-225, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 
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7.1 Terminology and Basic Concepts 

In contrast to other approaches to the synthesis of recursive programs, the 
approach presented here is independent of a given program language. In- 
stead, programs are characterized as elements of some arbitrary term algebra 
as proposed by Wysotzki (1983), based on the theory of recursive programs 
developed by Courcelle and Nivat (1978). A similar proposal was made by 
Le Blanc (1994) who formalized inductive program synthesis in a term rewrit- 
ing framework. The goal of folding is to induce a recursive program scheme 
from some arbitrary finite program term. Our approach is purely syntacti- 
cally, that is, it works on the structure of given program terms, independent 
of the interpretation of the symbols over which a term is constructed. The 
overall idea is to detect a recurrent relation in a finite program term. Our 
approach is an extension of Summers’ synthesis theorem which was presented 
in detail in section 6. 3. 4.1 in chapter 6. 

We first introduce some basic concepts and notations for terms, mostly 
following (Dershowitz & Jouanaud, 1990), then we introduce first order pat- 
terns and anti-unification (Burghardt & Heinz, 1996), and finally, recursive 
program schemes (Courcelle & Nivat, 1978). 

7.1.1 Terms and Term Rewriting 

Definition 7.1.1 (Signature). A signature E is a set of (funetion) symbols 
with a : A — > Af giving the arity of a symbol. The set A" C E represents 
all symbols with fixed arity n, where symbols in Eq represent constants. A 
signature can be defined over a set of variables X with X C\ E = %. 

Although we are only concerned with the syntactical structure of terms 
built over a signature, in table 7.1 we give a sample of function symbols 
and their usual interpretation which we will use in illustrative examples. For 
better readability, we will sometimes represent numbers or lists in their usual 
notation, instead as constructor terms (e. g., 3 instead of succ(succ(succ(0))) 
or [9, 4> 7/ instead of cons(9, cons(4, cons(7, nil)))). 

Definition 7.1.2 (Term). For the set Ts{X) of terms over a signature E 
and a set of variables X holds: 

1. XC Te{X), 

2. Eq C Ts{X), and 

3. ti, . . . , € Te{X), f £ En ^ /(tl, . . . ,tn) e Ts{X). 

Terms without variables are called ground terms and the set of all ground 
terms is denoted as T^. For a term t = /(ti, . . . , tn) the terms ti{i = 1 . . . n) 
are called sub-terms oft. Mapping var : Tj;(A) — > X returns all variables in 
a term. 
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Table 7.1. A Sample of Function Symbols 

0° natural number zero 

1° natural number one 

nil^ empty list, truth value “false” 

T° truth value “true” 

succ^ successor of a natural number 

pred} predecessor of a natural number 
head} head of a list 

tai} tail of a list 

eqO^ test whether a natural number is equal zero 

eql^ test whether a natural number is equal one 

empty^ test whether a list is empty 
cons^ insertion of an element in front of a list 
addition of two numbers 
multiplication of two numbers 
eq^ equality-test for two symbols 

test whether a natural number is smaller than another 
conditional “if x then y else z’' 



In the following we use the expressions term, program term, and tree as 
synonyms. Each symbol contained in a term is called a node in the term. 

Definition 7.1.3 (Position). Positions in a term t G Ts(X) are defined 
by: 

— X is the root-position of t, 

— if t = f{ti , . . . , tn) and u is a position in ti then i.u is a position in t. 

An example for a term together with an illustration of positions in the 
term is given in figure 7.1. 

Definition 7.1.4 (Composition and Order of Positions). The compo- 
sition V o w of two positions v and w is defined as u.w if v = u.X. 

A partial order over positions is defined in the following way: 

The relation v < w is true iff 

— V = w, or 

— exists a position u with v o u = w. 

Definition 7.1.5 (Sub-term at a Position). A sub-term of a term t € 
Ts{X) at a position u in t (written t\u) is defined as 

— t\\ = t, 

— if t = f{ti, . . . ,tn) and u a position in ti, then t\i,u = ti\u- 

Definition 7.1.6 (Positions of a Node). Function node : Tj;(A)x Posi- 
tion ^ S returns the symbol at a fixed position in a term: 
node(t,M) = / with t € Ts{X), a position u with t\u = /(ti, . . . ,tn), and 
fGS. 
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t = if(<(Kn),k,if(<(-(k,n),n),k,if(<(-(-(k,n),n),n),k,0))) 
Fig. 7.1. Example Term with Exemplaric Positions of Sub-terms 



The set of all positions at which a fixed symbol / appears in a term is denoted 
pos(t, /) and it holds Vw; S pos(t, /) : node(t, w) = f. 

The definitions can be extended to E^JX. If necessary, the definition of node 
can be extended to return additionally the arity of a symbol. With pos(t) 
we refer to the set of all positions for a term t and with Ipos(t) we refer to 
the set of all positions of leaf-nodes in a term. 

For the term in figure 7.1, a sub-term is for example tja.S.l.A = <(-(-(k, 
n), n), n^andaset of positions is for example pos(t, <) = {l.A, 3.1. A, 3. 3.1. A}. 

Definition 7.1.7 (Replacement of a Term at a Position). The replace- 
ment of a sub-term t\w by a term s € Ts{X) in a term t G Ts{X) is written 
as t[w <— s]. 

Definition 7.1.8 (Substitution). A substitution a : X ^ Ts{X) is a 
mapping of variables to terms. The extension of substitution for terms is 
a{f{ti, . . . , tn)) = /(cr(ti), . . . , a{tn)). Substitutions over a term are enumer- 
ated as t{xi, <— ti, . . . ,Xn <— tn} or t[ti, ■ . - tn] for short. 

Definition 7.1.9 (Term Rewrite System). A term rewrite system over 
a signature E is a set of pairs of terms TZ C Tz:{X) x Tfj{X). The elements 
{l,r) of TZ are called rewrite rules and are written as I r. 
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Definition 7.1.10 (Term Rewriting). Let TZ be a term rewrite system 
and t,t' S Ts{X) two terms. Term t' can he derived in one rewrite step 
from t using TZ (t T ), if there exists a position u in t, a rule I ^ r G TZ, 
and a substitution a : X ^ 'Ts(X), such that holds 

- t\u = a{l), 

- t' = t[u <— cr(r)]. 

TZ implies a rewrite relation — >t?,C Ts{X) x Ts{X) with {t,t') ift 

t' . The reflexive and transitive closure of —ffi is -^n. 

7.1.2 Patterns and Anti-unification 

A term t = a{p) which can be generated by substitutions over a term p is a 
specialization of p and p is called (first order) pattern of t. 

Definition 7.1.11 (First Order Pattern). A term p G Ts{Y) is called 
first order pattern of a term t G 7i(A), if there exists a subsumption relation 
p < t with a : Y 7y;(A) and a{p) = t. The sets of variables Y and X are 
disjoint. 

A pattern p of a term t is called trivial, if p is a variable and non-trivial 
otherwise. 

Definition 7.1.12 (Order over Terms). For two terms over the same sig- 
nature p,t G Tz:{X) holds p < t if p is a pattern of t and p < t if there exists 
no variable renaming such that p and t are unifiable. 

Definition 7.1.13 (Maximal Pattern). A term p is called maximal (first 
order) pattern of a term t, if p < t and there exists no term p' with p < p' 
and p' < t. 

An example for a first order pattern of a term is given in figure 7.2. For 
all non-trivial patterns holds that terms which can be subsumed by a pattern 
share a common prefix. 

A (maximal) pattern of two terms can be constructed by syntactic anti- 
unification of these terms. 

Definition 7.1.14 (Anti-Unificator). A termp is called (first-order) anti- 
unificator of two terms t and t' , if p < t and p < t' and if for all terms p' with 
p' < t and p' < t' is p' < p. That is, the anti-unificator is the most specific 
generalization of two terms and it is unique except for variable renaming. 

Definition 7.1.15 (Anti-Unification). Anti-unification □ : Ts{X) x 
Ts{X) — > Ts{X U Y) is defined as: 

- f{ti, ...,tn)n f{t[, ...,t'J = fih nt[,...,tr,n t'J, 

- /(fi,...,t„) n/'(t;,...,0 = q}{f{ti,...,tn),f{t[,...,t'^)) for f ^ f 
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Fig. 7.2. Example First Order Pattern 



with ip : Ts{X) x Ts{X) Y as injective mapping of terms in a (new) set 
of variables. 

Mapping p guarantees that the mapping of pairs of terms to variables is 
unique in both directions. Identical sub-term pairs are replaced by the same 
variable over the complete term (Lassez, Maher, & Marriott, 1988). 

Anti-unification can be extended from pairs of terms to sets of terms. 

Definition 7.1.16 (Anti-Unification of Sets of Terms). Anti-unifica- 
tion of a set of terms T |~| : 7i(X)+ ^ Ys{X U Y) is defined as: 



( t ifT={t} 

1 (ri(^ \ {0)) n t otherwise. 



First order anti-unification is distributive, that is, terms in T can be anti- 
unified in an arbitrary order (Lassez et ah, 1988). 

An example for anti-unification is given in figure 7.3. 



7.1.3 Recursive Program Schemes 

7. 1.3.1 Basic Definitions. The notion of a program scheme allows us to 
characterize the class of recursive functions which can be induced in program 
synthesis. 
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Fig. 7.3. Anti-Unification of Two Terms 



Definition 7.1.17 (Recursive Program Scheme). Let S be a signature 
and = {Gi, . . . , G„} a set of function variables with E r\<L> = % and arity 
a{Gi) = nii > 0. A recursive program scheme (RPS) S is a pair (G,to) with 
to G and G as a system of n equations: 



G = 



(Gi (xi , . . . , ) — ti , 

Gn (^1 ; • ■ • ; Xrrin ) — In) 



with ti € Tsvj'i>{X),i = 1 . . . n. 



An RPS corresponds to an abstract functional program with to as initial 
program call ( “main program” ) and ^ as a set of (recursive) functions ( “sub- 
programs”). The left-hand side of an equation in G is called program head, 
the right-hand side program body. The parameters Xi in a program head are 
pairwise different. The body ti of an equation Gi can contain calls of Gi or 
calls of other equations in G- 

In the following, RPSs are restricted in the following way: For each pro- 
gram body ti of an equation in G holds 



— var(ti) = {xi, . . . , Xrrii}, that is, all variables in the program head are used 
in the program body, and 

~ pos(ti, Gi) 0, that is, each equation is recursive. 



The first restriction does not limit expressiveness: Variables which never are 
used in the body of a program do not contribute to the computation of a 
result. The second restriction ensures that induction of an RPS from a finite 
program term is really due to generalization (learning) and that the resulting 
RPS does not just represent the given finite program itself. 



174 7. Folding of Finite Program Terms 



An example for an RPS is given in table 7.2. Equation ModList checks 
for each element of a list of natural numbers whether it is a multiple of a 
number n. The equation is linear recursive because it contains a call of itself 
as argument of the cons-operator and it calls a second equation Mod which 
is tail-recursive, that is, a special case of linear recursion, corresponding to 
a loop. The program call to in this example, is simply a call of equation 
ModList. In general, to can be a more complex expression from Ts{X) U L>. 
For example, we might only be interested whether the second element of the 
list is a multiple of n and call to = head(tail(ModList(l,n))). 

Interpretation of an RPS, for example by means of an eval-apply method 
(Field & Harrison, 1988), presupposes that all symbols in S are available as 
pre-defined operations and that the variables in to are instantiated. An RPS 
can be unfolded by replacing function calls by the corresponding function 
body where variables in the body are substituted in accordance to the pa- 
rameters in the function call. Generation of program terms by unfolding is 
described in the next section. 

Table 7.2. Example for an RPS 
S — {G, to) 

G= { 

ModList(l, n) = if( empty(l), 
nil, 

cons( if(eqO(Mod(n, head(l))), T, nil), 

ModList (tail(l), n))), 

Mod(k, n) = if(<(k, n), k, Mod(— (k, n), n)) 

) 

to = ModList{l,n) 



Variable instantiation is a special case of substitution (see def. 7.1.8) where 
variables are replaced by ground terms: 

Definition 7.1.18 (Variable Instantiation). A mapping P : X ^ TV is 

called instantiation of variables X. Variable instantiation can be extended to 
terms: /3 : Ts{X) Ts such that /3(/(G, . . . ,t„)) = /(/3(G), . ■ ■ , P{tn)) for 
f € E, arity a{f) = n, and ti, . . . ,t„ e 'TV(A). 

Please note, that we speak of the (formal) parameters of a program to 
refer to its arguments. For example, the recursive equation ModList(l, n) has 
two parameters. The parameters can be variables, such as n S X, or be 
instantiated with (ground) terms. 

7. 1.3.2 RPSs as Term Rewrite Systems. A recursive program scheme 
can be viewed as term rewrite system (Courcelle, 1990) or, alternatively as 
context-free tree-grammar (see next section). 

An RPS as introduced in definition 7.1.17 can be interpreted as term 
rewrite system as introduced in definition 7.1.9: 
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Definition 7.1.19 (RPS a Rewrite System). Let S = (G,to) be an RPS 
and 17 a special symbol. The equations in G constitute rules TZs of a term 
rewrite system. The system additionally contains rules TZq: 

• ■ • ; ^raf} ^ ti \ 

: ^m.i ) ^ ^ I i ^ 

We write for the reflexive and transitive closure of the rewrite relation 

implied by TZs U TZq . 

By means of the term rewrite system, the head of a user-defined equation 
(occuring in to or a program body ti) is either replaced by the associated 
body or by symbol 17 which represents the empty or undefined term. 

Definition 7.1.20 (Language of an RPS). Let S = (G,to) be an RPS 
and (3 : var(to) — > an instantiation of the parameters in tg. The set of all 
terms C{S,P) = {t \ t £ /^(lo) t} is the language generated by 

the RPS with instantiation (3. For a main program tg without variables, we 
can write C{S). 

The instantiation of all parameters in program call to implies that the lan- 
guage of an RPS is a set of ground terms, containing neither object variables 
X nor function variables That is, all recursive calls are at some rewrite 
step replaced by 17s. 

Definition 7.1.21 (Language of a Subprogram). Let S = (G,to) be an 
RPS. For an equation Gi £ G the rewrite relation is implied by the 

rules 

— {Gii^Xi, . . . Xjyif) > ti, Gii^Xi, . . . Xjnfj > 17). 

Let (3 : {xi, . . . Xrm} — > be an instantiation of parameters of Gi, then the 

set of all terms C{Gi, (3) = {t\t£ Ti;u{j7}us.\{Gi}^ P{Gi{xi, ..., x^J) ^Gi,Q 
t} is the language generated by subprogram Gi with instantiation (3. 

The term given on the left side of figure 7.4 is an example for a term be- 
longing to the language of the RPS given in table 7.2 as well as an example 
for a term belonging to the language of the subprogram ModList. The term 
represents an unfolding of the main program to where the ModList subpro- 
gram is called. For to = ModList(l, n) variables are instantiated as / = [9, 4, 7] 
and n = 8.^ Within this unfolding, there is an unfolding of the Mod subpro- 
gram which is called by ModList. At the same time, the term represents an 
unfolding of the subprogram ModList itself. The term given on the right side 
of figure 7.4 is an unfolding of ModList containing the call of the Mod subpro- 
gram. Neither of the two terms contain the name of the subprogram ModList 
itself. 

^ For a restricted signature with natural number 0 and successor-function only and 
list constructor cons, a natural number i can be represented as succ^(O) and a 
list as cons-expression. 
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to = ModList{l, n), P{1) = [9, 4, 7], P{n) = 8 



Fig. 7.4. Examples for Terms Belonging to the Language of an RPS and of a 
Snbprogram of an RPS 



7.1.3.3 RPSs as Context-Free Tree Grammars. In the remaining chap- 
ter, we will rely on the notion of RPSs as term rewrite systems. In this sec- 
tion we will shortly present an alternative, equivalent definition of RPSs as 
context-free tree grammars (Guessarian, 1992). 

Definition 7.1.22 (Context-free Tree Grammar). A context-free tree 
grammar (CFTG), C = (A, N, S, P), is a four-tuple of 

— a set of terminal symbols E , 

— a set of non-terminal symbols N with fixed arity and N n E = ib 

— an axiom A G TsyjN, and 

— a set of production rules P of the form G{x \, . . . , x„) ^ t with G G N and 
{xi, . . . , x„} C X, where X C\ {E \J N) = % is a set of reserved variables, 
and t € TsuNiX). 

If there exist for a non-terminal G £ N more than one rule G(xi, . . . , Xn) 
ti,... ,G(xi, . . . ,Xn) tm we write G(xi, . . . ,x„) ^ G | ... | tm for short. 

The language of a context-free tree grammar consists of all words (terms) 
which can be derived from the axiom and which do not contain non-terminal 
symbols. 

Definition 7.1.23 (Langnage of a CFTG). Let C = {A, N, E, P) be an 

CFTG and -^p the reflexive and transitive closure of the derivation relation 
— >p. The language C{C) of the grammar is Cc = {t £ Te \ A^p t\. 

The program call G of an RPS can be interpreted as axiom of an CFTG 
and the equations of an RPS as production rules. In contrast to an RPS, the 
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axiom must contain no variables, that is, all variables in the program call 
must be instantiated. 

Definition 7.1.24 (CFTG of an RPS). Let S = (G,to) be an RPS and 

(3 : X ^ Ts an instantiation of variables in tg (X = ~va.r(to)), then the 
CFTG Cs ,/3 of S and /3 is defined as Cs,f 3 = {P(to),L>, E, P) with 

{Gi (xi , . . . , Xjrii ) — I 17, 

P= : 

Gn{x\^ • . • ■^Xnrijf) — tn \ 17}. 

With such a CFTG can the language generated by an RPS be defined as 
C{S,(3) = C{Cs,^). 

7. 1.3.4 Initial Programs and Unfoldings. By means of the definition of 
an RPS as term rewrite system (see def. 7.1.19) and the language of an RPS 
induced by this system (see def. 7.1.20) we provide for a method to unfold 
an RPS into a term of finite length. Because all equations in an RPS must 
be recursive, such a finite term contains at least one symbol 17 (termination 
of rewriting) . Now we can characterize the input in our synthesis system as a 
term of which we assume that it is generated by an (yet unknown) recursive 
program scheme. 

Definition 7.1.25 (Initial Program). An initial program is a term t € 
whieh eontains at least one L2: pos(t, 17) 0. The term might be 

defined over instantiated variables only, that is, t G Fsu{n}- 

Definition 7.1.26 (Order over Initial Programs). Let t, t' GTsu{n}{^) 
be two terms. An order t <q t' is inductively defined as 

1. 17 <a t' , i/pos(t',17) 0, 

2. X <n t' , if X G X and pos(F, 17) 0, 

3. ffti, ...,tn)<n f(t [, . . .4), ifWi G {1, . . . ,n| holds U <n t[. 

The task of folding an initial program involves finding a segmentation 
and substitutions of the initial program which can be interpreted as the 
first i elements of an unfolding of an hypothetical RPS. Therefore, we now 
introduce recursion points and substitution terms for an RPS. 

For a given recursive equation, each occurrence of recursive call in its 
body constitutes a recursion point and for each recursive call the parameters 
in are substituted by terms: 

Definition 7.1.27 (Recursion Points and Substitution Terms). Let 

S = {Q,tff) be an RPS, Gi{xi, . . . ,Xrm) = U a recursive equation in S, and 
Xi = {xi, . . . Xrrii} the set of parameters of Gi. The non-empty set of recur- 
sion points Urec of Gi is given by Urec = pos(ti,Gi) with indices (positions) 
R={l,...,\Urec\}- 
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Each recursive call of Gi at position Ur £ Urec^f £ R in U implies substitu- 
tions Gr '■ Xi ^ Ts{Xi) of the parameters in Gi. Let sub : Xi x R T^iXi) 
be a mapping sub(a;j,r) = arixj) for all Xj £ Xi, then holds sub(xj, 7 ') = 
U\uroj- The terms sub(xj,r) are called substitution terms. 

An example for recursion points and substitution terms for the Fibonacci 
function is given in table 7.3. Fibonacci is tree recursive, that is, it contains 
two recursive calls in its body. The positions of these calls (see def. 7.1.3) in 
the term representing the body of Fibonacci are Urec = {3. 3.1. A, 3. 3. 2. A}. 
For referring to these recursion points we introduce an index set R = 
{1, . . . , \Urec\}, that is, for Fibonacci is R = {1, 2}. With ui £ Urec we refer 
to the first recursive call and with U 2 £ Urec we refer to the second recursive 
call. In the first recursive call, the parameter x is substituted by pred(x) and 
in the second recursive call, it is substituted by pred(pred(x)). 



Table 7.3. Recursion Points and Substitution Terms for the Fibonacci Function 
Recursive Equation: 

Gi(xi) = if( eqO(xi), 

1 , 

if( eqO(pred(xi)), 

1 , 

+ (Gi (pred(xi )), Gi (pred(pred(xi )))))) 

Recursion Points: Urec = {3. 3.1. A, 3. 3. 2. A} 

Recursive Calls: tils. 3 . 1 . a = Gi (pred(xi)), 

t 2 |s. 3 . 2 .A = Gi(pred(pred(xi))) 

Substitution Terms for xi: 

sub(xi,l) = tils.s.i.Aoi = Gi(pred(xi))\i.x = pred(xi), 
sub(xi,2) = ti|3.3.2.Aoi = Gi(pred(pred(xi)))\i.x = pred(pred(xi)) 



For a given recursive equation there is a fixed set of recursion points. This 
set contains a single position for linear recursion, two positions for tree recur- 
sion, and for more complex RPSs the set can contain an arbitrary number of 
recursion points. If a given term is generated by unfolding a recursive equa- 
tion, the recursion points occur in a regular way in each unfolding. Because, 
theoretically, a recursive equation can be unfolded infinitely many times, 
there are infinitely many positions at which recursive calls occur, that is un- 
folding positions. We will describe such positions over indices in the following 
way: 

Definition 7.1.28 (Unfolding Indices). LetS={G,to) be an RPS, Gi{xi, 
..., Xrrii) = ti a recursive equation in S, Xi = {xi, . . . Xrm} the set of pa- 
rameters of Gi, and Urec the set of recursion points of Gi with index set 
R = {1, . . . ,\Urec\}- The set of unfolding indices W constructed over R is 
inductively defined as the smallest set for which holds: 
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Fig. 7.5. Unfolding Positions in the Third Unfolding of Fibonacci 
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1. \gW, 

2. if w gW and r G R, then w or G W . 

To illustrate the concept of unfolding indices, we anticipate the concept 
of unfolding which will be introduced in the next definition. Consider the 
Fibonacci term in figure 7.5 which represents the third syntactic unfolding 
of the recursive function given in table 7.3. The zero-th unfolding would be 
just the empty term C and its position is A, that is the root of the tree. 
The first unfolding renders two unfolding positions, 3. 3.1. A and 3. 3. 2. A which 
are identical with the recursion points of the Fibonacci function. Unfolding 
the function at each of these positions again by one step results in four new 
unfolding positions, 3. 3. 1.3. 3.1. A, 3. 3. 1.3. 3. 2. A, 3. 3. 2. 3. 3.1. A, and 3. 3. 2. 3. 3. 2. A, 
and so on. Just as we refer to the recursion points using the index set R, 
we refer to the unfolding positions using an index set W. The unfolding 
points and the corresponding indices for the first three unfoldings of Fibonacci 
are given in table 7.4. For a tree recursive structure such as Fibonacci, the 
indices grow in a treelike fashion. For example index 1.1. 2. A refers to the right 
unfolding position in the two left unfoldings on the levels above. 



Table 7.4. Unfolding Positions and Unfolding Indices for Fibonacci 



Unfolding Unfolding Positions 

0 A 

1 3.3.1.A, 3.3.2.A 

2 3.3.1.3.3.1.A, 3.3.1.3.3.2.A, 

3.3.2.3.3.1. A, 3.3.2.3.3.2.A 

3 3.3.1.3.3.1.3.3.1.A, 3.3.1.3.3.1.3.3.2.A, 

3.3.1.3.3.2.3.3.1. A, 3.3.1.3.3.2.3.3.2.A, 

3.3.2.3.3.1.3.3.1. A, 3.3.2.3.3.1.3.3.2.A, 

3.3.2.3.3.2.3.3.1. A, 3.3.2.3.3.2.3.3.2.A 



Unfolding Indices 
A 

l.A, 2.A 

1.1. A, 1.2.A, 

2.1. A, 2.2.A 

1.1. 1. A, 1.1.2.A, 

1.2. 1. A, 1.2. 2. A, 

2. 1.1. A, 2. 1.2. A, 

2.2. 1. A, 2. 2. 2. A 



Defining the unfolding of a recursive equation is based on the positions 
in a term at which unfoldings are performed and on a mechanism to replace 
parameters in the program body: 

Definition 7.1.29 (Unfolding of an Recursive Equation). Let S = 

be m RPS, Gi{x\, . . . ,Xm.i) = U a recursive equation in S, Xi = 
{xi , . . . , Xrrii} the set of parameters of Gi, Urec the set of recursion points of 
Gi with index set R = {1, . . . , |t/rec|}; xnd sub(a:j,r) the substitution term 
for Xj G Xi in the recursive call of Gi at position Ur G Urec,x G R. The 
set of all unfoldings Ti of equation Gi over instantiations (3 \ Xi ^ Fs is 
inductively defined as the smallest set for which holds: 

1. Let f3\ = (3 be the instantiation of variables in unfolding A. Then v\ = 
(3x(ti) and v\ gF. 

2. Let Vw G Ti be an unfolding with instantiation (3w Then for each 
r G R exists an unfolding v^or G Ti with instantiations (3wor{xj) = 
/3u)(sub(xj , r)) for all xj G Xi, such that Vwor = f3wor(ti). 
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An unfolding is a term, which is generated in a derivation step t —^s,n t' 
by applying a term rewrite rule Gi{x\, . . . ,Xrm) — ^ ti (see def. 7.1.19). The 
resulting term is an element of language C{Gi,(3) (see def. 7.1.21). For a term 
t, which is the result of repeated application of rule (3{Gi{xi, . . . ,Xmi)), the 
unfolding indices w correspond to the sequence in which the recursive calls 
were replaced. 

Please note, that the initial instantiation of variables (3\ can be empty, 
that is, the variables appearing in the program body can remain uninstan- 
tiated during unfolding. For syntactical unfolding, as described above, there 
is no possibility to terminate unfolding for paths which could never been 
reached during program evaluation. For example, in figure 7.5 at each level 
of unfolding positions, all recursive calls where unfolded. It is not checked, 
whether one of the conditions eqQ{x\) or eqQ{pred{xi)) is already true for 
the given instantiation of x\. Unfolding of terms is a well introduced con- 
cept, which we already described in chapter 6 (sects. 6. 2. 2.1 and 6. 3. 4.1). In 
the definition above, we simply reformulated unfolding in our terminology. 

The synthesis problem is performed successfully, if an RPS with a set 
of recursive equations can be induced which can recurrently explain a given 
initial program. 

Definition 7.1.30 ((Recursive) Explanation of an Initial Program). 

A recursive equation Gi{x\, . . . ,Xrm) = t% explains an initial program tinit, 
if there exist an instantiation (3 : {x\, . . . Xmi } ^ T^s of the parameters of Gi 
and a term t € C(Gi,(3), such that Unit t. Gi is a recurrent explanation 
of tinit if furthermore exists a term t' S L{Gi,(3) which can be derived by at 
least two applications of Gi{xi , . . . , XmJ U such that t' <q Unit- 
An RPS S = {GAo) explains an initial program tinit, if there exist an in- 
stantiation (3 : var(to) — > of the parameters of the program call to and a 

term t € C{S,(3), such that tinit t. S is a recurrent explanation of tinit 
if furthermore exists a term t' G C{S,(3) which can be derived by at least two 
applications of all rules TZs such that t' <n tinit- 

An equation/RPS explains a set of initial programs, if it explains all terms 
in the set and if there is a recurrent explanation for at least one term. 

Definition 7.1.30 states that an recursive equation or an RPS defined over 
a set of such explanations (subprograms) together with an initial program 
call explains an initial program, if some term t can be derived such that the 
initial program is completely contained in this term (see def. 7.1.26). For a 
recurrent explanation it must hold additionally, that some term U can be 
derived by repeated unfolding (that is, at least two times) such that this term 
t' is completely contained in the initial program. 
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7.2 Synthesis of RPSs from Initial Programs 

Based the definition of RPSs, initial programs, unfoldings, and recurrent 
explanations introduced above, we are now ready to formulate the synthesis 
problem, which we want to solve, precisely. Before we do that, we will first 
discuss the relation between the fixpoint semantics of recursive functions 
and induction of recursive functions from initial programs and give some 
characteristics of RPSs. 

7.2.1 Folding and Fixpoint Semantics 

As discussed in chapter 6, induction can be viewed as inverse process to de- 
duction. For example, in inductive logic programming, more general clauses 
can be computed by inverse resolution, a term or clause can be generalized 
by antf-unification (Plotkin, 1969). Summers (1977) proposed that folding an 
initial program into a (set of) recursive equation can be seen as inverse of fix- 
point semantics (Field & Harrison, 1988; Davey & Priestley, 1990). Summers 
synthesis theorem is given in section 6. 3. 4.1 in chapter 6. 

We introduce fixpoint semantics in appendix BB.l. For short, fixpoint 
semantics allows to give a denotation to recursively defined syntactic func- 
tions. The semantics of a (continuous) function (over an ordered domain) is 
defined as the least supremum of a sequence of unfoldings (expansions) of 
this function. 

For folding, we consider a given finite program term as the i-th unfolding 
Unit = G* of an unknown recursively defined syntactic function G. Find- 
ing a recursive explanation for Gi as described in definition 7.1.30, corre- 
sponds to segmenting Gi into a sequence of unfoldings G°, G^, . . . , G* where 
G’^ = tk[u <— G^“^] with M as a fixed position in tk and a as substitution 
of parameters holds for k = 1, ... ,i. In that case, based on the recurrence 
relation which holds for G*, the recursive function G = t[u ^ Ga] can be 
extrapolated. An example is given in table 7.5. 

7.2.2 Characteristics of RPSs 

In section 7. 1.3.1 we already introduced two restrictions for RPSs: All vari- 
ables given in the head of an recursive equation must appear in the body 
and each equation contains at least a call of itself. In the following, further 
characteristics of the RPSs which can be folded using our approach are given. 

Definition 7.2.1 (Dependency Relation of Subprograms). Let S = 

{GGo) be an RPS with a set of function variables (names of subprograms) 
(L. Let H ^ <L> be a function variable (for the unnamed main program). A 
relation calls^ C {H} U x $ between subprograms and main program of S 
is defined as: 
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Table 7.5. Example of Extrapolating an RPS from an Initial Program 



tinit — G — 



if(<(k,n ), k, if(<(-(k, n)), k, if(<(-(-(k, n), k), k), k, 17))) 

— Segmentation into hypothetical unfoldings of a recursive function G: 

G° =def n 

= if(<(k, n),k, 17) 

G^ = if(<(k, n),k, if(<(-(k, n)),k, 17)) 

G® = G‘ 



— Recurrence Relation and Extrapolation: 

G^ = if«(k, n),k, Gf,^_(,,„)j) 

G^ = if(<(k, n),k, Gf,^_(,,„)j) 

G^ = if«(k, n),k, Gf,^_(,,„)]) 
extrapolate: 

G = if(<(k, n),k, G;*. — (fc,„)]) 



calls 5 = {(iJ, Gi) I Gi € pos(to, Gi) 0} U 

{(G„G,) I G„G, e^,pos(t„G,) ^0}. 

The transitive closure callsj o/calls^ is the smallest set callsj C {H}U$x$ 
for which holds: 

1. calls 5 C callsj, 

2. for all P € {H} U <P and Gi, Gj € <1>: If P callsj Gi and Gi callsj Gj 
then P callsj Gj. 

For introducing a notion of minimality of RPSs two further restrictions 
are necessary: 

Only Primitive Recursion: All recursive equations are primitive recursive, 
that is, there are no recursive calls in substitutions. 

No Mutual Recursion: There are no recursive equations Gi,Gj with i ^ j 
with Gi callsj Gj and Gj callsj Gi. 

The first restriction was already given implicitly in definition 7.1.27 where 
sub maps into Ts{Xi). A consequence of this restriction is that we cannot 
infer general /r-recursive functions, such as the Ackermann function. The 
second restriction is only syntactical since each pair of mutually recursive 
functions can be transformed into semantically equivalent functions which 
are not mutually recursive. 

Definition 7.2.2 (Minimality of an RPS). Let S = (G,to) be an RPS. 

S is minimal if for all subprograms Gi{x\, . . . ,Xrm) = ti,Gi G Xi = 
{xi, . . . , Xrrii} holds: 

No unused subprograms: H callsj Gi. 

No unused parameters: For each instantiation P \ Xi ^ Ts o,nd instantia- 
tions Pj : Xi — > Ts, j = 1, • ■ • j m-i constructed as 
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t k = j, 
P{xk) j 



k = j,t P{xj) 



holds 

No identical parameters: For all (3 \ Xi ^ Ti and all Xi,Xj € Xi holds: For 
all unfoldings Vw & Ti, w G W (see def. 7.1.29) with instantiation (3^ of 
variables follows i = j from (3w{xi) = (3w{xj). 

Note, that similar restrictions were formulated by (Le Blanc, 1994). To 
sum up the given criteria for minimality: An RPS is minimal if it only contains 
recursive equations which are called at least once, if each parameter in the 
head of an equation is used at least once for instantiating a parameter in the 
body of the equation, and if there exist no parameters with different names 
but (always) identical instantiation. It is obvious, that each RPS which does 
not fulfill these criteria can be transformed into one which does fulfill them 
by deleting unused equations and parameters and by unifying parameters 
with different names but identical instantiations. That is, these criteria do 
not restrict the expressiveness of RPSs. 

A more strict definition of minimality could additional consider the size 
(i. e., number of symbols) of the subprogram bodies (Osherson, Stob, & We- 
instein, 1986). But this makes only sense for RPSs with a single recursive 
equations. In that case, minimality of a subprogram G is given if there exists 
no subprogram G' with C{G' ,(3) C C{G,P) (Miihlpfordt & Schmid, 1998). 
For RPSs with more than one equation, the minimality of each single equation 
cannot be determined by comparison of languages: If equation G calls other 
equations Gi there is always a smaller language where these calls are replaced 
by constant symbols. Therefore, it would be necessary to define minimality 
over the size of all equation bodies and the term representing the main pro- 
gram. There are too many degrees of freedom to define a thight criterium: 
For example, is an RPS consisting of a small main program but complex sub- 
programs smaller than an RPS consisting of a complex main program and 
small subprograms? 

To guarantee uniqueness for calculating the substitutions in a subprogram 
body for a given initial program, we introduce the notion of “substitution 
uniqueness” : 

Definition 7.2.3 (Substitution Uniqueness of an RPS). Let Tinu C 
Tsu{r?} ^6 ® of initial programs and S = (GNo) o,n RPS over S and 
<F which explains Tinu recursively (see def. 7.1.30). S is called substitution 
unique with regard to Tmit if there exists no S' over E and <F which explains 
Tinit recursively and for which holds: 

to 

— for all Gi G F holds pOs{ti,Gi) ~ pOs(t',Gi) = Urec, t'^[Urec ^ 17] = 
ti[Urec 17], and it exists an r G R with sub(a:j,r) sub'(a;j,r) (see 
def. 7.1.27). 
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Substitution uniqueness guarantees that it is not possible to replace a 
substitution term in S such that the resulting RPS S' still explains a given 
set of initial programs recursively. 

7.2.3 The Synthesis Problem 

Now all preliminaries are given to state the synthesis problem: 

Definition 7.2.4 (Synthesis Problem). Let Tinu C be a set of 

initial programs with indiees I = {1, . . . , |Tmii|}- The synthesis problem is to 
induce 

— a signature S, 

— a set of function variables = {Gi, . . . , G„}, 

~ a minimal RPS S = (G, to) with a main program to G and a set of 

recursive equations G = . . . ,Xmi) = ti, . • . , G„(xi, . . . ,Xm„) = tn) 

such that 

— S recursively explains Tina (def. 7.1.30), and 

— S is substitution unique (def. 7.2.3). 

In the following section, an algorithm for solving the synthesis problem 
will be presented. Identification of a signature E is dealt with only implicitly 

— the RPS is constructed exactly over such symbols which are necessary for 
explaining the initial programs recursively. Because the folding mechanism 
should be independent of the way, in which the initial programs were con- 
structed (by a planner, as input of a system user), we allow for incomplete 
unfoldings and we allow instantiated initial programs as well as programs 
over parameters (i. e., generic traces), where parameters are treated just like 
constants. 

If synthesis starts with only a single initial program, the main program 
can contain no parameters (except the parameters which where identified 
as “constants”). If a parameterized main program is searched for (which is 
the usual case), we assume that all parameters inferred for the subprograms 
which are called from the main program are also parameters of the main 
program itself. 



7.3 Solving the Synthesis Problem 

In this section, the algorithms for constructing an RPS from a set of initial 
programs (trees) are presented: For each given initial tree, in a first step, a 
segmentation is constructed (sect. 7.3.1), which corresponds to the unfoldings 
(see def. 7.1.29) of a hypothetical subprogram. As intermediate steps, first 
a set of hypothetical recursion points (see def. 7.1.27) is identified and a 
skeleton of the body of the searched for subprogram is constructed. 
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If recursion points do not explain the complete initial tree, all unexplained 
subtrees are considered as a new set of initial trees for which the synthesis 
algorithm is called recursively and the subtrees are replaced by the name 
Gi of the to be inferred subprogram (sect. 7.3.3). If an initial tree cannot 
be segmented starting from the root, a constant initial part is identified and 
included in the body of the calling function - that is, in to for top-level 
subprograms and in Gi if calls(Gi, Gj) and Gj is the currently considered 
hypothetical subprogram (see figure 7.13 in section 7. 3. 5. 2). 

For a given segmentation, the maximal pattern is constructed by including 
all parts of initial trees which are constant over all segments into the subpro- 
gram body (sect. 7.3.2). All remaining parts of the segments are considered 
as parameters and their substitutions. That is, in a last step, a unique substi- 
tution rule must be calculated and as a consequence, the parameter instantia- 
tions of the function call are identified (sect. 7.3.4). The only backtrack-point 
for folding is calculating a valid segmentation for the initial trees. 

We begin each subsection with an illustration. For readers not interested 
in the formal details, it is enough to study the illustration for getting the 
general idea of our approach. 

7.3.1 Constructing Segmentations 

In the following, we first give an informal illustration. Then, we introduce 
some concepts for construction segmentations of initial programs. Afterwards, 
we introduce criteria which must hold for a segmentation to be valid. Finally, 
we present an algorithm for constructing valid segmentations. 

7. 3. 1.1 Segmentation for ‘Mod’. Consider the simple RPS for calculat- 
ing Mod(k, n), which consists of a single recursive equation and where the 
main program to is just the call of this equation: 

5 = {Q,to) 

Q = { Mod(k, n) = if(<(k, n), k, Mod(— (k, n), n)) ) 
to = Mod{k, n) 

Remember, that all recursive equations are considered as “subprograms”. 
The main program, in general, can be a term which calls one or more of the 
recursive sub-programs of the RPS (see sect. 7. 1.3.1). 

In figure 7.6 a term is given which can be folded into this RPS. For better 
readability, we do not give an initial instantiation for parameters k and n. The 
first and crucial step of folding is, to identify a valid recurrent segmentation 
from an initial program term tinit- For the given term, such a segmentation 
is marked in the figure. 

As a first step, all paths in the term leading to Cs are identified; that 
is, paths to positions where the unfolding of the hypothetical (searched for) 
RPS was stopped. These paths constitute the skeleton of the term, which is 
defined as the minimal pattern of the term (see def. 7.1.11) which contains 
all f7s and where all sub-terms which are not on paths to f7s are ignored. 
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[ * init ] Segment 1 




Fig. 7.6. Valid Recurrent Segmentation of Mod 



Then, a set of hypothetical recursion points UrCC (see def. 7.1.27) is gen- 
erated. There are different possible strategies, to obtain such a set from tinit- 
The most simple hypothesis is, that, beginning at the root, each node in the 
skeleton constitutes an unfolding position. For our example, this hypothesis 
is Urec = {3. A}. We can construct hypothetical unfolding indices W from 
Urec (see def. 7.1.28) which leads to the segmentation given in figure 7.5. 

Replacing all subtrees at positions Urec in the skeleton by I7s results in 
a hypothesis about the program body tgkei- For our example, the segmen- 
tation induced by Urec = {3-A} is valid because each segment consists of 
the same (non-trivial) pattern and because all I7s (here only a single one) 
can be reached and this 1? has a position which corresponds to a recursion 
point (unfolding position). The valid segmentation is recurrent because the 
skeleton of the upper-most segment, tgkei constitutes a non-trivial pattern 
for all subtrees of the initial program where u is a hypothetical recursion 
point. The notion of a valid recurrent segmentation follows directly from the 
definition of a recursive explanation of a recursive program (see def. 7.1.30). 

If a valid recurrent segmentation can be found, the result of this first step 
of folding is a skeleton of the sub-program body. For our example it is if(y, 
y’, G), where G is a new function variable in 

As stated in the definition of the synthesis problem (see def. 7.2.4), input 
into the folding algorithm is a set of initial programs Tinu- That is, a user 
or another system can provide more than one example program which are 
considered as unfoldings of the same searched-for RPS with different instan- 
tiations and/or different unfolding depth. In our example, Tmn consisted of 
a single initial program. For some definitions below, it is enough, to consider 
a single program tinit G Tinu, for others, it is important, that the complete 
set is taken into account. 
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In general, finding a valid recurrent segmentation can be much more com- 
plicated as illustrated for Mod: 

— It might not be possible to find a valid recurrent segmentation starting at 
the root of the initial tree, that is, there exists a constant initial part of the 
term which belongs to the main program to- In that case, this initial part is 
introduced in to and segmentation starts with Tinu as set of all sub-terms 
of this initial part. 

— The recurrence relation underlying the term can be of arbitrary complexity. 
That is, in general, the skeleton does not just exist of a single path. Our 
strategy for constructing segmentations is to find segmentations which start 
as high in the given tree as possible and which cover as large a part of 
the tree as possible. As we will see in the algorithm below, we search for 
hypothetical recursion points “from left to right” in the tree. 

— There might exist sub-terms in the given initial program which contain f2s 
but which cannot be covered within any valid recurrent segmentation. In 
this case, it is assumed, that these subtrees belong to calls of a further 
recursive equation. The positions of such subtrees are considered as “sub- 
schema positions” Usub and finding valid recursive segmentations for these 
subtrees is dealt with by starting the folding algorithm again with these 
subtrees as new set of initial trees. 

— We allow for incomplete unfoldings. That is, some paths in the initial pro- 
gram are truncated before an unfolding is complete. For example, for the 
recursive equation 

addlist(l) = if(empty(l), 0, + (head(l), addlist(tail(l)))) 
the last unfolding could result in an f? at the position of -I-, because the 
else case is not considered if list I is empty. Although we consider unfolding 
as a purely syntactical process, it might be useful to take into account that 
construction of the initial program was based on some kind of semantic 
information (such as evaluation of expressions). 

Consequently, when constructing a segmentation, it is allowed that a seg- 
ment contains an above the hypothetical unfolding position (but not 
below!). 

All these cases are covered in our approach which we will introduce now 
more formally. 

7. 3. 1.2 Segmentation and Skeleton. The first and crucial step for in- 
ducing an RPS from a set of initial programs is the identification of a valid, 
recurrent segmentations of the initial trees. When searching for a segmenta- 
tion, at first only such nodes of the initial trees are considered which are lying 
on paths leading to hypothetical recursion points. These paths constitute a 
special (minimal) pattern of an initial tree, called skeleton: 

Definition 7.3.1 (Skeleton). The skeleton of a term t G j writ- 

ten skeleton(t) is the minimal pattern (def. 7.1.11) oft for which holds 
pos(t, 17) = pos(skeleton(t), 17). 



7.3 Solving the Synthesis Problem 189 



Let U C pos(t, f2) in t with tjj = L2 for all u £ U . Then we define 
skeleton(t, U) as that skeleton of term t which contains only the f2s at posi- 
tions u £ U , that is, the minimal pattern with pos(skeleton(t, U),L2) = U . 

An example skeleton for Mod is given in figure 7.5. This is the simple case 
of a single initial program which can be explained by an RPS consisting of a 
single recursive equation and a main program which just calls this equation. 
Note, that the paths under consideration are leading to 12s, that is, truncation 
points of the unfolding of a hypothetical RPS.^ 

In general, it can be necessary for finding a recursive explanation of an 
initial program to identify a subprogram Gi which calls further subprograms. 
The calls of these subprograms must be at fixed positions in Gi, called sub- 
schema positions^. In that case, the skeleton of Gi does not contain all paths 
leading to f?s (see def. 7.3.1). For these remaining I7s must hold, that they can 
be reached over paths from the hypothetical sub-schema positions. Recursion 
points and sub-schema positions constitute a hypothesis for inducing an RPS. 

Definition 7.3.2 (Recursion Points and Sub-Schema Positions). Let 

Urec and Usub be two sets of positions with Urec ^ 0- If for all Usub £ Usub 
and all positions u above Usub (usub = uoi,i£N) holds 

1. /SltJ-ec G Urec with Ugub < Mrec and 

2 . 3Urec G Urec with U < Urec, 

then {Urec, Usub) is a hypothesis over recursion points Urec and sub-schema 
positions Ugub- 

For the initial program given in figure 7.7 (see tab. 7.2 for the underlying 
RPS) the recursion point is 3. 2. A, that is the second “(/” on the rightmost 
path. The sub-schema position is 3.1. A, that is, the “i/” in the left path 
under ''''cons" . For this position holds that it is not above the recursion point 
(condition 1 of def. 7.3.2) but it can be reached over a path leading to this 
recursion point (condition 2 of def. 7.3.2): Usub = it o 1 = 3.1. A for u = 3. A. 

7. 3. 1.3 Valid Hypotheses. Recursion points and sub-schema positions 
constitute a valid hypothesis if they are on paths leading to I7s and where 
the f2s are at positions corresponding to recursion points or - for incomplete 
unfoldings - above recursion points. For example, in the initial tree given in 
figure 7.7 the last unfolding is incomplete: After the ‘if’-node in the right-most 
path should follow another ‘cons’-node; but, instead it follows an J7. 



^ For initial trees generated by a planner or by a system user that means, that 
cases for which the continuation is unknown or not calculated must be marked 
by a special symbol. 

Because the inference of a further sub-program again can involve a constant 
initial part, not just a recursive equation but an additional “main” could be 
inferred which will then be integrated in the calling subprogram. Therefore, we 
speak of sub-schema positions rather than subprogram positions 
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Fig. 7.7. Initial Program for ModList 



(9,4,7) I I I I (’•4,7) I I (9,4,7) 
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Definition 7.3.3 (Valid Recursion Points and Sub-Schema Positions). 

Let be Unit Bin initial tree and {Urec, Ugub) o, hypothesis, (t/rec Usub) is a valid 
hypothesis for Unit if 

1. \/u e Usub-' IfuG pOs{Unit), then pOs{Unit\u, 1^) 7^ 0, 

2. Vm € Urec: U e pos{Unit) and {UrecUsub) is valid for Unit\u or 

3. Vm e Urec: S Unit with un <U and Unit\un = 

The first condition of definition 7.3.3 ensures that sub-schema positions 
are at such positions in the initial tree where the subtrees contain at least one 
fl. Condition 2 recursively ensures that the hypothesis holds for all sub-terms 
at the recursion points. Condition 3 ensures that the Cs have positions at or 
above recursion points. 

Up to now we have only regarded the structure of an initial tree. A valid 
segmentation additionally must consider the symbols of the initial tree (i. e., 
the node labels). Therefore, hypothesis {Urec, Usub) will be extended to a 
term tgkei which represents a hypothesis about a sub-program body. 

Definition 7.3.4 (Valid Segmentation). Let Unit G ^u{i7} with 
pos{t, f2) ^ $ be an initial program. Let {Urec, Usub) be a valid hypothesis 
about recursion points and sub-schema positions in t. Let tskei G Tsib{n}{X) 
be a term with tskei tinn (def. ^7.1.26) and Urec^Usuh — PB^^{tskei, L2). A 
segmentation given by {Urec, Usub) together with tgkei is valid for tmit if for 

— Uc = Urec C pos{tinit) (the sct of recursion points in tinit), and 
~ Uic = Urec \ Uc (the set of recursion points not in tinit) 

holds: 

Reachability of fis: For all Uic G Uic exists a un G pos(U„it,l7) with un < 
■ 

Validity of the Skeleton: Given the set of all positions of L2s in tinit which are 
above the recursion points as U(^{un \ un < u,un G pos{Unit, G),u G 
Uic\ it holds tskel\Uir ^ C] SiQ tinit- 
Validity of Segmentation in Subtrees: For all u G Uc holds 

{Urec, Usub) together with tskei is a valid segmentation for t\u. 

The definition is somewhat complicated because we are allowing incom- 
plete unfoldings. Note, that we currently are not concerned with the question 
of how recursion points are obtained. We just assume that a non-empty set 
of hypothetical recursion points Urec is at hand which can contain positions 
which can be found in the given tinit and can contain positions which are not 

in tinit- 

The term tskei is a hypothesis about the skeleton of the body of a searched 
for subprogram. It contains f2s (truncations of unfoldings) at the hypo- 
thetical recursion points and sub-schema positions. For incomplete imfold- 
ings, for each given recursion point there must exist an f2 at or above this 
point (“reachability”). The set U(^ contains all such positions and the term 
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tskei [U-^ ^ fl] is the skeleton of the hypothetical subprogram body reduced 
to the size of the currently investigated and possibly incomplete sub-term. 
For a valid hypothesis it must hold further, that the currently investigated 
term contains a sub-term with at least one fi and that there are no further 
I7s in tinit- The positions in Usub cover all i7s which are not generated by 
the currently investigated subprogram (“validity of skeleton”). Finally, these 
conditions must hold for all further hypothetical segments (“validity of seg- 
mentation in subtrees”). Note, that for a tree which is a subtree of tinit 
with a recursion point as root, it can happen that not all positions in Urec 
are given! 

If we extend this definition from one initial tree tinit to a set of initial 
trees Tinit, we request that at least one of the initial programs contains a 
completely unfolded segment: 

Definition 7.3.5 (Segmentation of a Set of Initial Programs). {Urec, 
Usub) together with tgkei is a valid segmentation of a set of initial programs 
Tinit if there exists a Unit & Tinit with ts kel ^init and if {Urec, Usub) 
together with tgkei is a valid segmentation of all t € Tinit as defined in 7.3.4- 



A valid segmentation partitions an initial tree into a sequence of segments. 
These segments can be indexed in analogy to the unfoldings of a subprogram 
(def. 7.1.29): 

Definition 7.3.6 (Segments of an Initial Program). Let Tinit Q 
Tsuin} ® set of initial programs and E a set of indices with te G Tinit 
for all e G E. Let R = {1, . . . , |C/rec|} be a set of indices of recursion points. 
Let {Urec, Usub) together with tskei be a valid segmentation for all te G Tinit 
and W the set of indices over segments constructed over Urec- Let G be a 
new function variable with G ^ E. Then we can define segment (tg, CAec, w) 
with G as name for the subprogram as segment of te with respect to recursion 
points Urec with segment index w GW inductively: 



1. segment(t, f/rec, A) 



t[Urec n pos(t) <— G] if t Q 
T otherwise 



2. segment(t, Urec, r.v) 



segment(t|ti,., f/rec, w) if r £ R and Ur G pos{t) 
T otherwise. 



Function skeleton(te, Urec^w) returns for a given initial program te and 
a set of hypothetical recursion points Urec the according segment with index 
w if it is contained in the tree and otherwise “unknown” A new function 
variable is inserted at the position of the recursion point which will be the 
name of the to be induced subprogram. For the case of incomplete unfoldings 
the f2s positioned above hypothetical recursion points remain in the tree. 



® Note, that we use the symbol T to represent an unknown segment rather than ft 
to discriminate between an undefined sub-term in a term {ft) and an undefined 
hypothetical function body (T). 
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It is not possible, that a hypothetical subprogram body can consist of the 
subprogram name G only (item 1 in def. 7.3.6). 

The segments of an initial program correspond to unfoldings of a hypo- 
thetical recursive equation as defined in 7.1.29. But (up to now), segments 
can additionally contain subtrees (unfoldings) of further recursive equations 
(see sect. 7.3.3). 

The set of all indices for a given initial program with given segmentation 
can be defined as 

Definition 7.3.7 (Segment Indices). Let T he a set of initial trees in- 
dexed over E with t^ G T for all e G E. Let {Urec Usub) together with tgkei be 
a valid segmentation for all terms in T and let R = {!,..., |C/r-ec|} be a set of 
indices of recursion points and W the set of unfolding indices calculated over 
R (def. 7.1.28). The W{e) is the set of indices over the segments contained 
in an initial tree te with W{e) = {w \ w GW, segment(te, Urec, w) 7 ^ -L}. 

For inducing a recursive subprogram from a given initial program, this 
initial program must be of “sufficient size”, that is, it must contain a sufficient 
number of (hypothetical) unfoldings such that all information necessary for 
folding can be obtained from the given tree. Especially, it is necessary that 
each hypothetical subprogram is unfolded at least once at each hypothetical 
recursion point. A hypothesis over recursion points of a subprogram can only 
lead to successful folding if the following proposition holds: 

Lemma 7.3.1 (Recurrence of a Valid Segmentation). Let {Urec, bJ sub) 
together with tgkei be a valid segmentation for a set of initial programs Tinu 
and let R = {!,..., |f7rec|} be the set of indices over recursion points Urec- 
{Urec, Usub) Can Only generate a subprogram which recursively explains Tina, 
if for each recursion point Ur G Urec there exists an initial tree te G Tmn such 
that exists w G W{e) with w = v o r,v G W{e),r G R. 

Proof: follows directly from definition 7.1.30. 

Lemma 7.3.1 is a weak consequence from the definition of recursive ex- 
planations (def. 7.1.30), because it is not required that segments correspond 
to complete unfoldings. That is, we have given just a necessary but not a 
sufficient condition for successful induction. 

7. 3. 1.4 Algorithm. The definition of a valid segmentation (def. 7.3.4) al- 
lows us to formulate some relations which can be used for an algorithmic 
construction of a segmentation hypothesis: 

~ Because of conditions tskei and Urec C pos{tskei) must hold 

Urec pOs{tiuit) ■ 

— From the condition “validity of segmentation in subtrees” follows for all 
Urec G Urec that node{tinit , Urec) = node(fi™t , A) . That is, only such 
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positions in an initial tree can be candidates for recursion points which 
contain the same symbol as its root node.® 

— The positions of further sub-schemata can be inferred from the uppermost 
unfolding in tinit: Usub = {tt | tt G lpos(skeleton(ti„it[17rec f^])) \ 
Ur-ec, pos(ti„it|u, yf 0}. (Remember that Ipos(t) returns the positions 
of leaf nodes of a term, see def. 7.1.6). 

— Consequently, a hypothesis tgkei about the skeleton of the subprogram body 
can be constructed: tskei = skeleton(t„it[([/i.ec U Usub) ^ C]). 

As mentioned before, constructing a hypothesis about the set of possible 
recursion points Urec from a given set of initial trees is the first and crucial 
step of inducing an RPS. This hypothesis determines the set of sub-schema 
positions and the skeleton of a searched for subprogram body. It represents 
an assumption {Urec, Usub) about the segmentation of an initial tree which 
corresponds to the f-th unfolding of a searched for recursive equation. The 
validity of a hypothesis can be initially checked using the criteria given in 
definition 7.3.4 and it can checked whether it holds recursively over an com- 
plete initial tree using lemma 7.3.1. If validation fails, a new set of recursion 
points Urec must be constructed. If no such set can be found, the searched for 
RPS does not consist of a main program which just calls a recursive equation, 
and search for recursion points must start at deeper levels of the tree. In the 
worst case, the complete tree would be regarded as main program with an 
empty set of recursive equations.^ 

For constructing possible sets of recursion points, each given initial tree 
must be completely traversed. The construction is based on a strategy which 
determines the sequence in which possible hypotheses are generated. The pro- 
posed strategy ensures that minimal subprograms which explain maximally 
large parts of an initial tree are constructed: 

Search for a segmentation, starting with Urec = 0 at position u = l.A considering 

the following cases: 

Case 1: (The initial trees are completely traversed.) 

If there exists no further position 

— If Urec yt 0 then construct Usub and tskei with respect to Urec- 

If {Urec, Usub) together with tskei is a valid segmentation, then stop else 
backtrack. 

— If Urec = 0 then no recursion points could be found and consequently, there 
exists no subprogram which recursively explains the initial trees. 

Case 2: (The node at position u is recursion point.) 

Set Urec — Urec U . 

Construct Usub and tskei with respect to Urec- 



® Note, that it is possible, that a given initial tree contains a constant part be- 
longing to the body of the calling function, such that the initial tree for which a 
recursive subprogram is to be induced has a root which is on a deeper position. 

^ Because we search for an RPS with at least one recursive equation, our algorithm 
terminates already at an earlier point - if the remaining tree is so small, that 
it is not possible to find at least one unfolding for each hypothetical recursion 
point. 
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If (UrecUsub) together with tskei is a valid segmentation, then progress with 
Urec to the next position u' right of u. 

Otherwise go to case 3. 

Case 3: (The node at position u is lying above a recursion point.) 

If there exists a position u' = uo 1 in the initial trees, progress with Ur^c at 
position u' . 

Otherwise go to case 4. 

Case 4: (There are no nodes lying below the current node.) 

Progress with Umc at the next position u' right of u. 

This strategy realizes a search for recursion points from left to right where 
for each hypothetical recursion point search proceeds downward. The function 
for calculating the next position to the right is given in table 7.6. 



Table 7.6. Calculation of the Next Position on the Right 

Function: nextPosRight : Tsu{n} x pos{ti„it) pos{ti„it) for all Unit € Tinit 
Pre: Tinit is a set of initial trees; a is a position in the initial trees 
Post: next position right of u if such a position exists, T otherwise 

1. IF u = A 

2. THEN return T 

3. ELSE 

a) Let u = u' o k 

b) IF G Tinit with u' o (fc + 1) G pos(t) 

c) THEN return u' o [k + 1) 

d) ELSE return nextPosRight(Ti„it, u'). 



7.3.2 Constructing a Program Body 

For a given valid segmentation the body of a hypothetical subprogram can 
be constructed. For each node of a segment it must be decided whether it 
belongs to 

~ the body of the subprogram, or 

— the parameter instantiation, or 

— a further subprogram. 

In this section we will only consider subprograms without calls of further 
subprograms, that is Usub = 0- An extension for subprograms with additional 
subprogram calls is given in section 7.3.3. In the following, we will first give 
an example for constructing the program body. Afterwards, we will intro- 
duce a theorems concerning the maximization of the program body. Finally, 
we will present an algorithm for calculating the program body for a given 
segmentation of a set of initial trees. 
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7. 3. 2.1 Separating Program Body and Instantiated Parameters. 

For illustration we use a simple linear subprogram ~ the recursive equation for 
calculating the factorial of a natural number. This subprogram and its third 
unfolding is given in table 7.7. The parameter x is instantiated with 2. In the 
first case, represented as succ(succ(0)) and in the second case, represented as 
pred(3) (short for pred(succ(succ(succ(0))))). 



Table 7.7. Factorial and Its Third Unfolding with Instantiation succ{succ{0)) (a) 
and pred{3) (b) 

Subprogram: G(x) = if(eqO(x), 1, *(x, G(pred(x)))) 

Unfolding (a): Third unfolding for j3(x) = succ(succ(0)) 

Unit =if(eq0(succ(succ(0))), l,*(succ(succ(0)), 

if(eqO(pred(succ(succ(0)))), l,*(pred(succ(succ(0))), 

if(eq0(pred(pred(succ(succ(0))))),l,*(pred(pred(succ(succ(0)))),17)))))) 

Unfolding (b): Third unfolding for fl{x) = pred(3) 

Unit =if(eq0(pred(3)), l,*(pred(3), 

if(eq0(pred(pred(3))), l,*(pred(pred(3)), 

if(eq0(pred(pred(pred(3)))), l,*(pred(pred(pred(3))),17)))))) 



A valid recurrent segmentation for Unit is Urec = {3. 2. A}, and the skeleton 
is tskei = ifivi, 2 / 2 , *( 2 / 3 , ■!^))- The segments for Unit are given in table 7.8. 



Table 7.8. Segments of the Third Unfolding of Factorial for Instantiation 
succ(succ(0)) (a) and pred(3) (b) 



(a) Index w 
X 

l.A 

l.l.A 

(b) Index w 
A 

l.A 

l.l.A 



Segment 

if(eq0(succ(succ(0))), 1, 

if(eqO(pred(succ(succ(0)))), 1, 

if(eqO(pred(pred(succ(succ(0))))),l, *(pred(pred(succ(succ(0)))),17)) 



*(succ(succ(0)), 17)) 
*(pred(succ(succ(0))), 17)) 



Segment 
if(eq0(pred(3)), 
if(eq0(pred(pred(3))), 
if (eqO (pred (pred (pred (3) ) ) ) , 



1, *(pred(3), 17)) 

1, *(pred(pred(3)), 17)) 

1, *(pred(pred(pred(3))),17)) 



The program body can be constructed as maximal pattern (see def. 7.1.13) 
of all given segments. For the segments constructed from the initial tree with 
instantiation P{x) = succ(succ(0)), the program body therefore is 

tc = if{eqO{x'), 1 , *{x' , G')). 

The body is identical with the definition of factorial given in table 7.7. 

For the segments constructed from the initial tree with instantiation 
/3{x) = pred(3), the program body is 
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tG" = if(eqO{pred{x'')), 1 ,*{pred{x"), G')). 

A part of the parameter instantiation got part of the program body! This is a 
consequence from defining the program body as maximal pattern of the seg- 
ments of the initial trees. If an instantiation is given which shares a non-trivial 
pattern (i. e., the anti-instance is not just a variable; see def. 7.1.11) with the 
substitution term (in the recursive call), then this pattern will be assumed 
to be part of the program body. A simple practical solution is to present the 
folding algorithm with a set of initial trees with different instantiations. 

In the following, it will be shown that if a recursive equation exists for 
a set of initial trees then we can find a recursive subprogram with a “maxi- 
mized” body. This resulting subprogram together with the found parameter 
instantiation is equivalent to the “intended” subprogram with the “intended” 
instantiation. Of course, it does not hold, that the induced program and the 
intended program are equivalent for all possible parameter instantiations! 

7 . 3 . 2.2 Construction of a Maximal Pattern. The body of a subprogram 
is constructed by including all nodes in the skeleton which are common over 
all segmentations. All subtrees which differ over segments are considered as 
instantiated parameters, that is, the initial instantiations of the parameters 
when the subprogram is called and the instantiations resulting from substi- 
tutions of parameters in recursive calls. A given set of initial trees - in the 
following called examples - can contain trees with different initial parameter 
instantiations as shown in the factorial example above. 

Theorem 7.3.1 (Maximization of the Body). For a set of example ini- 
tial trees, let E he a set of examples indices e with | if | > 1. For each recursive 
subprogram G{xi, . . . ,Xn) = to with X = {xi, . . . ,Xn} and to € Ts\j^>{X) 
together with initial instantiations j3e '■ X ^ for all x £ X and all 
e G E exists a subprogram G'{xi, . . . ,Xn>) = tc' with X' = {xi, . . . ,Xn'} 
and tc € together with initial instantiations /?' : X' 7^ for 

all X G X' and all examples e G E such that L{G,j3e) = £(G',/3') for all 
e G E. Additionally, for each x G X' holds that the instantiations which can 
he generated by G' from /?' don’t share a common not-trivial pattern. 

Proof: see appendix BB.2. 

Theorem 7.3.1 states that if a, recursive subprogram G exists for a set of 
initial trees Emit = {te \ e G E} which can generate all tg G Tinu for a given 
initial instantiation /3e, then there also exists a subprogram G' such that the 
instantiations (over all segments and all examples) do not share a non-trivial 
pattern (a common prefix). Therefore, it is feasible to generate a hypothesis 
about the subprogram body for a given segmentation with recursion points 
Urec by calculating the maximal pattern of the segments. 

Theorem 7.3.2 (Construction of a Subprogram Body). Let Tinu C 
« set of initial trees with te G Tmu for all e G E. Let {Urec, Usub) 
he a valid segmentation of Tinu with Usub = 0 and G the function variable 
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used for constructing the segments. The maximal pattern to G Tsug{X) of all 
segments segment (tg, C^reo tt') is the resulting hypothesis about the program 
body. 

Proof: Follows from theorem 7.3.1 and definition 7.3.6. 

The variable instantiations for a maximal pattern to are given by the 
subtrees at the positions where the segmentations differ or (for uncomplete 
imfoldings) by _L, if these positions do not occur in a given tree (see def. 7.3.6). 



Definition 7.3.8 (Variable Instantiations). Let T C 7iu{i7} be a set of 

terms indexed over E, to the maximal pattern of terms in T , and X = 
~va.r(to) the set of variables in the maximal pattern. The instantiation (3e '. 
X — !■ 7iU{_L} of variables x € X at positions u G pos(tG; 2 ;) in term te € T 
is defined as 



Pe{x) 



te\u u£ pos(te) 
_L otherwise. 



7. 3. 2. 3 Algorithm. The maximal pattern of a set of terms can be calcu- 
lated by first order anti-unification as described in definition 7.1.16 in section 
7.1.2. Only complete segments are considered. For incomplete segments, it 
is in general not possible to obtain a consistent introduction of variables 
during generalization. An example problem, using two terms of Fibonacci 
(introduced in table 7.3) is given in table 7.9. 



Table 7.9. Anti-Unification for Incomplete Segments 
We define tr\f2 — Qnt = t. 

Complete Segment: ti = if {eq0{2), 1, if{eq0{pred{2)), 1, -|-(G, G))) 
Incomplete Segment: t 2 = if{eq0{pred{pred{2))), 1, fl) 
Anti-Unification: if{eqO{xi), 1, if {eqQ{pred{2)), 1, -|-(G, G))) 



Anti-unification gives us the maximal body to of a subprogram G. The 
variables in the subprogram represent that subtrees which differ over the 
segments of an initial tree. After this step of folding, the instantiations of these 
variables are still a, finite set of terms, namely, the differing subtrees. In section 
7.3.4 it will be shown, how this finite set is generalized by inferring input 
variables, their initial instantiation, and the substitution of these variables in 
the recursive call. 

7.3.3 Dealing with Further Subprograms 

Up to now we have ignored the case that there are path leading to I2s which 
cannot be explained by the constructed segmentation and that cannot be 
generated by the resulting maximal subprogram body. In this section we will 
present how folding works if Usub is not empty. First, we give an illustrative 
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example. Then we introduce decomposition of initial trees with respect to 
positions in Usub- We show that integrating calls to further subprograms in 
the body of a subprogram called from the main program to is well-defined. 
Finally, we describe how the original initial tree can be reduced by replacing 
subtrees corresponding to calls of further subprograms by their name. 

7. 3. 3.1 Inducing Two Subprograms for ‘ModList’. Remember the 
ModList example presented in table 7.2 and the example initial tree from 
which this RPS can be inferred given in figure 7.7. Let us look at this initial 
tree again (see fig. 7.8): We can find a first segmentation which explains the 
right-most path leading to an f? with Urec = {3. 2. A}. But, the initial tree con- 
tains further subtrees with f7-leafs, that is Usub = {3.1. A} ^ 0. If the found 
first segmentation is valid, there must exist a further subprogram which can 
generate these remaining subtrees containing I7s (marked with black boxes in 
the figure). In segments one to three (segment four is an incomplete unfold- 
ing, as discussed above), the left-hand subtrees of the ‘cons ’-nodes contain 
these yet unexplained subtrees. 

Folding has started with inferring an RPS S = (G,to) and a promising 
segmentation has been found which might result in a subprogram G\ G G 
and a main program to which calls G\. Now, the folding algorithm is called 
recursively with the unexplained subtrees as new sets of initial trees T)“ with 
u € Usub- For the ModList example, there exists only one position in Usub 
and we have three example trees which must be explained. The recursive call 
of the folding algorithm results again in finding a recursive program scheme 
5“ = (G^Uo) ^nd not just a recursive subprogram in G- This is, because the 
call of the further sub-program might be embedded in a constant term tg 
which later becomes part of the body of the calling subprogram Gi. That 
is, we start to infer a sub-scheme which afterwards can be integrated in the 
searched- for RPS. 

Again, the first step of folding is to find a valid recurrent segmentation. 
In figure 7.9 the new set of initial trees is given. The initial hypothesis, that 
to consists only of the call of the subprogram fails. There is a constant initial 
part if(eqO(x), T, nil). Starting segmentation further down, at the first ‘if’- 
node, results in a valid recurrent segmentation which can explain all three 
initial trees. 

We already described, how the maximal body can be constructed by find- 
ing the common pattern of all segments (see sect. 7.3.2. For the searched 
for subprogram G 2 we find the body if(<(x\, head(x 2 )), x\, G 2 / We will 
explain, how initial instantiations and substitutions in recursive calls are cal- 
culated below, in section 7.3.4. For the moment, just believe that the resulting 
subprogram G 2 is the one given in the box in figure 7.9. The calling main 
program for the sub-scheme is tg = if(eqO(G 2 (x\, X2)), T, nil). The initial 
trees T)“ can be explained with the following initial instantiations: 



Segment 1 
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Fig. 7.8. Identifying Two Recursive Subprograms in the Initial Program for Mod- 
List 
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Fig. 7.9. Inferring a Sub-Program Scheme for ModList 



Explanation in segment 1 of the original tree Explanation in segment 2 of the original tree Explanation in segment 3 of the original tree 
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First Tree: = 8, (3{x2) = [9,4,7], 

Second Tree: /3(xi) = 8, (3{x2) = toi/([9, 4, 7]), 

Third Tree: (3(xi) = 8, P{x 2 ) = tail{tail{[9,4,7])). 

Now the inference of the sub-scheme is complete and the folding algorithm 
comes back to the original problem. The originally unexplained subtrees are 
replaced by the three calls of the main program tp given at the bottom of fig- 
ure 7.9. The inferred subprogram G 2 is introduced in the set of subprograms 
Q of the to be constructed RPS S. The reduced initial tree is given in fig- 
ure 7.10. The folding algorithm proceeds now with constructing the program 
body for Gi. 




Fig. 7.10. The Reduced Initial Tree of ModList 



7. 3. 3. 2 Decomposition of Initial Trees. If a valid recurrent segmenta- 
tion {Urec^ Usub) was fouiid for a set of initial trees Tinu and if Usub ^ 0, then 
induction of an RPS is performed recursively over the set of subtrees at the 
positions in \J sub- 

Definition 7.3.9 (Decomposition of Initial Trees). LetTinu C 7^u{r?} 
be a set of initial trees indexed over E. Let {Urecj Usub) be a valid segmenta- 
tion of Tinit with Usub 7 ^ 0- The set of initial trees for all u S Usub is 
defined as: 

Tfnit= {segment{te,Urec,w)\u \ eGE,wGW{e), 

u G pos(segment(te, Urec^w))}. 
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The index set i?“ of elements in is constructed as £1“ = {(w, e) | e G 
E,w £ W{e),u G pos(segment(te, C/rec) tf))} and for elements t(u,_e) & 
with (w,e) G i?“ holds = segment(te, t/rec t«)|u- 

From the construction of {Urec, Usub) follows that each subtree at a posi- 
tion in Usub contains at least one 17 (see def. 7.3.2 and def. 7.3.3). An example 
for an initial tree with non-empty sub-schema positions was given in figure 

7.7. 

7. 3. 3. 3 Equivalence of Sub-schemata. In general, there can exist alter- 
native RPSs which can explain a set of initial trees. These RPSs might have 
different numbers of parameters in the main program with different initial 
instantiations. We want to deal with induction of sub-schemata for subtrees 
of a set of given initial trees as independent sub-problem. For that reason, 
it must hold that - if there exists a valid hypothesis for initial trees in 
which is part of a recursive explanation of Tinu ~ each possible solution for 
can be part of the recursive explanation of Tmit ■ 

For understanding the following theorem, remember that the main pro- 
gram tg of a sub-scheme must be valid over different examples for this scheme. 
For the ModList illustration (see fig. 7.9) there were three given trees for in- 
ferring the sub-scheme and a main program tg could be generated covering 
all three trees with different initial instantiations /3(tg). 

Theorem 7.3.3 (Equivalence of Sub- Schemata). Let Tinu be a set of 

initial trees indexed over E. Let = (tj,^^) and = (tg,t/^) be two 

RPSs with subprograms G\, . . . and G^, . . . ,G^^. The RPSs together 

with initial instantiations (3]. : 7^ and /3g : ^ Ti for all e £ E and 

parameters = var(fj) and X^ = var(tg) recursively explain Tmit- Let fg 
be the maximal pattern of all terms /3e(tg) and tg the maximal pattern of all 
terms /3e(tg) for e £ E. R holds that for each parameter £ X^ exists a 
parameter x"^ G X^ such that for all e £ E: (3l(x^) = /3g(a;^). 

Proof (Equivalence of Sub-Schemata). Let 5^ be a sub-schema constructed by the 
“maximization of body” principle (see theorem 7.3.1) and 5^ a sub-schema which 
is obtained in some different way, that is, recursion points must be at “deeper” 
positions in the initial trees. 

Let be a parameter of tj. For instantiations holds that they do not 

share a common non-trivial pattern over all positions of in the segments of the 
initial trees. It holds 3e,e' G E with node(/?e (®^)) 7 ^ node(/3M®^)) (postulation 
in theorem 7.3.3). 

Variable is used in RPS 5^ to generate sub-terms of the given initial trees. 
Let u^i be a position in initial trees te, e £ E, where terms are generated by 
instantiations p){x^). The sub-terms at positions te|u^i are identical to the initial 
instantiation (3l{x^) and do not share a non-trivial pattern. It follows, that in RPS 
5^, the sub-terms te\u^i must also be explained by an initial instantiation in main 
(to) (case 1) or be part of an instantiated parameter occuring in an unfolding of a 
subprogram (case 2). 
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Case 1: (sub-terms te|u^i are explained by an initial instantiation in Iq) 

Let e be a variable and u ^2 £ pos(to{Gi <— 17, , Gu 2 17}, a;^) be a 
position of this variable in main with u ^2 < u^i. Because was introduced by 
anti-unification, exist e,e' £ E with node(u,„2, te) 7^ node(u„.2 , te')- Because 
for all u < u^i holds that node(u,te) = node(u,te') for all e,e' £ E follows 
u ^2 = u^i and therefore Pl{x^) = 

Case 2: (sub-terms are part of a parameter instantiation in an nnfolding of 

a subprogram from S^) 

Let Us be a position with us < u^i such that terms are instantiations of 

a parameter in an unfolding of a subprogram from 5^. These instantiations are 
generated by a sequence of substitutions (in the call of a subprogram from main, 
in the recursive call of a subprogram, in the call of a further subprogram, ...). 
Let tsubat £ Ts{X^) be a combination of such substitutions where variables in 
are parameters of Iq. That is, it holds te\us = tsubst[Pa{xi), ■ ■ ■ ,/3e(®m)] = 
Plitaubat) for all e£ E. 

Let be the position of sub-term fe|ti i with respect to position us, that is 
(ie|uo)|u* = telu 1 . Because sub-terms tAu ^ for all e £ i? and therefore sub- 
terms (3i{tsubat)\-a‘ , do not share a non-trivial pattern, terms (3i{tsubat)\u^ , 
must be part of instantiations of variables x^ £ X^ . Let be a position 
of variables in tsubat with u’’^ < m“i and . Because for 

XX I ^2 

each position u < u^i holds node(/?e (ts«6st)U) = node(/3g, (fsubst)U) for all 
e,e' £ E and because Pe(x^) do not share a common prefix (postulation) must 
\io\A Plix^) = Pl{tsuhsi)\u\ =ta\u 1 =a\{x^). 

Theorem 7.3.3 states that for each RPSs with a given initial instantia- 
tion of variables in the main program which explains the set of subtrees at 
positions Usub results an identical instantiation of variables. Therefore, inde- 
pendent of the induced subprogram together with a calling main program, 
the number and instantiations of parameters occuring in this main program 
are unique. The proof is based on the idea that for a given set of initial trees 
and a given sub-schema 5^ each other sub-schema which explains the 
same initial trees shares certain characteristics from which follows that pa- 
rameter instantiations are identical. In the first case, we consider parameter 
instantiations which are part of the instantiation of parameters in the calling 
main program. If u ,^2 is above in the initial trees then the subtrees must 
differ at position u,^ 2 . But we already know that is exactly that position 
at which the subtrees differ the first time. Therefore and must be 
identical positions in the subtrees and as a consequence the instantiations 
which are given as the subtrees at these positions must be identical! Case 
two is analogous for sub-terms in segments corresponding to imfoldings of a 
subprogram in . 

For illustration consider again the ModList example given in figure 7.10 
and the searched-for RPS given in table 7.2. The subprogram ModList with 
parameters I and n calls a further subprogram Mod. The “partial” program 
body of ModList, that is, the maximal pattern of all segments without con- 
sideration of further subprograms, is: 



7.3 Solving the Synthesis Problem 205 



ModList = if (empty (1), nil, cons (u, ModList)). 

For the set of subtrees at segment positions u = 3.1. A for example the Mod 
subprogram given in table 7.2 can be induced with parameters k and n. For 
this subprogram the calling main program is if(eqO(Mod(n, head(l))), T, nil) 
with parameter n being passed through from the main program of the RPS 
and parameter k being constructed by substituting the parameter I given in 
the main program by head(l). Because ModList as well as Mod is constructed 
by calculating the maximal pattern over all segments, each possible realiza- 
tion of Mod together with its calling main must contain parameters n and k 
= head(l). 

7. 3. 3. 4 Reduction of Initial Trees. If sub-schemata explaining all sub- 
trees can be induced for all positions Usub, these sub-schemata can be 
inserted in the segments of the subprogram which calls the sub-schema. That 
is, the concerned subtrees are replaced by the main programs tg which call 
further sub-programs. 

Lemma 7.3.2 (Reduction of Initial Trees). Let Tinit be a set of initial 
trees indexed over E. Let (Urea, Usub) together with tgkei be a valid segmen- 
tation for all te € Tinit and let Usub 0- 

1. If there exists an RPS 5“ = (^“,^ 0 ) for each u e Usub which ex- 
plains the initial trees constructed as described in definition 7.3.9 

together with initial instantiations : var(tQ) 7iU{_L}®, the seg- 

ments of initial trees in Tinit can be transformed by segment (tg, Lrec, 
w) = segment(te, [/rec: ^ trees e £ E, unfold- 

ing indices w € W{e), and sub-schema positions u G pos (segment (tg, 
Urec-, nj)) • 

2. Let be a set of properly named subprograms (no two subprograms have 
identical names Gi) with 

{Gf(xi,...,Xni^) =tf, 

: 

Gn(xi,...,Xn,^) =tl) 

and let G{x\, . . . ,Xm) = tc be the (yet unknown) subprogram for segmen- 
tation (UrecUsub)- The RPS S = (G,to) which explains Tinit, contains 
all subprograms of the sub-schemata 5“ = (G'^yt^) for all u € Usub- 

(G(xi , ... , Xm) — Ig, 

Gf{Xi,...,Xm'i) =tf, 

G= . 

G“(xi,...,a;™«) =tl). 



8 



There can exist subtrees for the sub-schemata which are too small to infer all 
variables. 
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Proof (Reduction of Initial Trees). 1. If an RPS 5“ = (5“,to) together with an 
initial instantiation explains an initial tree , then there exists a 

term t G £(5“, ,,)) with t“u,_e) f- Therefore, if the snbtree at position u 
in segment segment(te, ffrec, m) is replaced by the instantiated calling main 
program then a term t' with segment(te, t/rec, w) <n t' can be 

generated from segment(te, [free, w)[w <— /3(“u,^e) (to)] using rules of the term 
rewrite system TZs'‘ ■ 

2. To generate the original segments (or larger subtrees) from the segments where 
at all positions u G Usub the subtrees are replaced by the instantiated calling 
main programs, all subprogram definitions of the sub-schemata 5“ = (5“,to) 
are necessary. 

After reduction, in the segments of Tmu the subtrees at positions Usub are 
replaced by the instantiated calling main of the corresponding subprograms. 
That is, these subtrees do no longer contain I7s and the program body of the 
segments can be constructed as described in section 7.3.2. 

The prescribed construction of sub-programs imposes the following re- 
striction on initial trees: A maximal pattern for the subprograms can only 
be constructed if there exist at least two trees t(u;,e); such that 

for the initial instantiation of all variables x G var(tQ) holds /3“euj)(^) 7^ 
and 7^ T. 

A consequence of inferring sub-schemata separately is, that subprograms 
are introduced locally with unique names. It is possible, that two such subpro- 
grams are identical and have only different names. After folding is completed 
the set of subprograms can be reduced such that only subprograms with 
different program bodies remain in Q. 

7.3.4 Finding Parameter Substitutions 

After constructing a subprogram body as maximal pattern of all segments, 
the remaining not explained subtrees must be parameters and their substi- 
tutions. Anti-unification already results in a set of variables. The last com- 
ponent of inducing an RPS from a set of initial trees is to identify the sub- 
stitutions these variables in the recursive calls of the subprogram. 

Variable substitutions in recursive calls can be quite complex. In table 7.10 
some recursive equations with different variants of substitutions are given. In 
the most simple case, each variable is substituted independently of the other 
variables and keeps its position in the recursive call (/i). A variable might 
be also substituted by an operation involving other program parameter (/ 2 ). 
Additionally, variables can switch there positions (given in the head of the 
equation) in the recursive call (/s). Finally, there might be “hidden” variables 
which only occur within the recursive call (fi). If you look at the body of 
/ 4 , variable z occurs only within the recursive call. The existence of such a 
variable cannot be detected when the program body is constructed by anti- 
unification but only a step later, when substitutions for the recursive call are 
inferred. 
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Table 7.10. Variants of Substitutions in Recursive Calls 

Individual Substitutions, Identical Positions 
fl(x, y) = if(eqO(x), y, +(x, fl(pred(x),succ(y)))) 

Interdependent Substitutions 

f2(x, y) = if(eqO(x), y, +(x, f2(pred(x),+(x,y)))) 

Switching of Variables 

f3(x, y, z) = if(eq0(x), +(y, z), +(x, f3(pred(x), z, succ(y)))) 
Hidden Variable 

f4(x, y, z) = if(eq0(x), y, +(x, f4(pred(x), z, succ(y)))) 



7. 3.4.1 Finding Substitutions for ‘Mod’. Let us again look at the Mod 
example. The valid segmentation for the second initial trees given in figure 7.9 
is depicted again in figure 7.11. We already know from calculating the subpro- 
gram body if(<(x\, head(x 2 )), x\, G 2 ) that there remain two different kinds 
subtrees which must be explained by variables and their substitutions in re- 
cursive calls. Remember, that in constructing the subprogram body for Mod, 
there were three initial trees at hand and the program body was constructed 
by anti-unifying the segments gained from all three initial trees. Variable X\ 
reflects differences appearing already in segments of a single of such initial 
trees, but variable X 2 reflects differences between segments of different initial 
trees. 




Fig. 7.11. Substitutions for Mod 



For better readability we write the remaining subtrees for each segment 
of the given tree into a table: 
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a:i (l.l.l.A) 0:2 (1.1.2.1.A) *3 = ®i (1.2.A) 

“8 tail([9,4,7J) 8 

— (8, head(tail([9,4,7]))) tail([9,4,7]) —(8, head(tail([9,4,7]))) 

— (—(8, head(tail([9,4,7]))), tail([9,4,7]) —(—( 8, head(tail([9,4,7]))), 

head(tail([9,4,7]))) head(tail( [9,4,7]))) 

The middle column represents the instantiations of X 2 which is constant over 
all segments. If only this single initial tree were be considered, this term would 
be part of the program body. The left-hand and right-hand columns contain 
exactly the same terms on each level. With these considerations, it is enough 
to consider the changes from one level to the next in the first two columns. 
Terms on succeeding levels represent changes from one segment to the next, 
that is, for the program body induced by this segmentation, these changes 
must be explained by substitutions of parameters in the recursive call. 

When going from the first to the second level, we find an occurrence 
of xi as well as of X 2 , that is —(8, head(tail([9,4,7]))) = —(x\, head(x 2 ))- 
This leads to the hypothesis, that the initial instantiations are xi = 8 and 
X 2 = tafZ([9, 4, 7]) with substitutions x\ —(xi, head(x 2 )) and X 2 ^ X 2 - 
We used the first two segments to construct a hypothesis. As we will see 
below, in general, we need two further segments to validate this hypoth- 
esis. For now, let it be enough that we validate the hypothesis by going 
from the second to the third segment. Applying the found substitution 
to x\ = —(8, head(tail([9,4,7]))) results in —("—(8, head(tail([9,4,7]))), 
head(tail([9,4,7]))) and because X 2 = tail([9,4,7]), the substitution hypothe- 
sis holds! 

The illustration gives an example for an interdependent substitution (see 
tab. 7.10) because the substitution of x\ depends not only on xi itself, but 
also on X 2 - 

7. 3.4. 2 Basic Considerations. A necessary characteristic of substitutions 
is that they are unambiguously determined in the (infinite set of) unfoldings 
(see def. 7.1.29). In that case, substitutions form regular patterns in the initial 
trees and therefore can be identified by pattern matching. 

Theorem 7.3.4 (Uniqueness of Substitutions). Let S = {G,to) be an 
RPS, Gi{xi, . . . ,Xmi) = ti a subprogram in S, and Xi = {xi, . . . ,Xmi} the 
set of parameters. Let Urec be the set of recursion points of Gi with index set 
R = {1, . . . , jCAecI}, W the unfolding indices constructed over R, and Ti the 
set of all unfoldings of equation Gi over instantiations l3 : Xi Ts- Let ti 
be a program body which corresponds to the maximal pattern of all unfoldings 
Vw € Ti. Letsuh{xj,r) be a substitution term (def. 7.1.27) for Xj S Xi in the 
recursive call of Gi at position Ur € Urea r € R. For all terms tg € Ts{X) 
with Ww € W . fdyjor(.Xj') — tg(x± ^ /3^(xi), . . . ^Xm^ ^ /^ti;(^mi)} holds 

tg = sub(xj, r). 

Proof: see appendix BB.3. 
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Because substitutions are unique over unfoldings, it holds that for each 
subtree under a recursion point it is determined whether this term is a param- 
eter with its initial instantiation or a substitution term over such a parameter. 

Definition 7.3.10 (Characteristics of Substitution Terms). Let 

sub(xj,r) be a substitution term for parameter Xj at position r. For all po- 
sitions u, starting with u = X, holds the following inductively defined charac- 
teristic: 



Xk 



sub(a;j, r)|„ = < 



f{s-nh(xj,r)\uoi,..., 

sub(a;j,r)|uon) 



(a) yw €W : P(^or){xj)\u = Pw{xk) and 

(b) -i3t G Ts[Xi) with t Xk, such that 
holds 

\/w e W : Pwor{_Xj)\u = 

t\^X± ^ fduj (^l), . . . , Xmi ^ fiin j ) } 

Vw e W : node{Pwor {xj),u) = f £ S 
with arity a{f) = n. 



For sufficiently large initial trees, it can be decided for each parameter 
in the r-th unfolding how it is substituted in the next r o rc-th unfolding. 
For example, in a recursive equation with two parameters x\ and X2 with 
/3(xi) = 0 and (3{x2) = smcc(O), both parameters might be substituted by 
Xi ^ succ{xi). Condition (a) alone allows that X\ X2 is identified as 
substitution, but this is only a legal substitution, if condition (b) holds addi- 
tionally. For the given example, for X\ in the r-th unfolding can be found a 
substitution term succ{x\) in the r o rc-th unfolding. A function for testing 
uniqueness of substitution is given in table 7.11. The function makes use of 
a function which tests the recurrence of a substitution given in table 7.12. 



Table 7.11. Testing whether Substitutions are Uniquely Determined 

Function: uniqueness : X x X y. pos(U„if) x Te bool 
Pre: E is the set of indices for Tinu 

R is the index set of recursion points 
W (e) ist the set of segment indices for te £ Tinu 
X is the set of variables in to and Xj ,Xk € X 

I3(xk,w,e) returns the instantiation of variable Xk € X for segment w £ W (e) 
t /3 is an instantiation of Xj £ X with tg L in & segment with w = r.A 
u is the position which is currently checked 
Post: true if for Xj cannot be found a recurrent substitution for a variable from 
X \ {xk}, false otherwise 
Side-effect: Break if an ambiguity is detected 

uniqueness(xj, a:*;, M, t/s) = 

1. IF U pOs(to;) 

2. OR £ (X \ {sfc}) : betaRecurrent(a:j, Xi, w)) (see table 7.12) 

3. OR (-iVe,e' £ E,w £ W{e),w' £ W{e') : node(/3(xj, o r, e), u) = 
node{fi{xj,w' o r, e'), m))) 

4. AND Let / = node(t/ 3 , u) with a{f) = n In 
VI = 1, . . . n : uniqueness(xj, Xfc, uo l,wi,ei) 

5. THEN return TRUE 

6. ELSE return FALSE and break 



210 7. Folding of Finite Program Terms 



Table 7.12. Testing whether a Substitution is Recurrent 

Function: betaRecurrent : X x X x R x pos{tinu) bool 
Pre: E is the set of indices for Tinit 

R is the index set of recursion points 

W (e) ist the set of segment indices for ts £ Tinu and w £W{e) 

X is the set of variables in tc and Xj,Xk & X 

/3{xk,w,e) returns the instantiation of variable xt ^ X for segment w G W{e) 
tp is an instantiation of Xj £ X with 7 ^ T in a segment with w = r.A 
u is the position which is currently checked 
Post: true if in all instantiations oi Xj G X in segment w o r at position u it can 
be found the instantiation of Xk in segment w, false otherwise 

betaRecurrent (a:j, Xfc, r, u) = 

Ve £ i?, Vw o r £ lT(e) : 

0{xj,w o r, e)\u =± 0{xk,w, e) 

(see def. 7.3.11 for =x) 



For a given hypothesis ic about a subprogram body (see def. 7.3.2 in 
sect. 7.3.2), variable instantiations can be determined for each segment. Af- 
ter determining variable instantiations for segments, the initial instantia- 
tions can be separated from the substitution terms by calculating the differ- 
ences between the segment-wise instantiations. To calculate variable instanti- 
ations for segments, we can extend the definition for parameter instantiations 
(def. 7.1.18) in the following way: 



Definition 7.3.11 (Instantiation of Variables in a Segment). Let 

Tinit C "Tiuin} ® of initial trees indexed over E. Let {Urec^Usub) with 
Usub = $ be a segmentation of trees in Tinit o^nd to the resulting maximal 
pattern with X = var(t( 3 )- W{e) he the set of segment indices for te € 
Tinit- The instantiations (3 \ X x W x E ^ Ts U {T} for variables in X 
occuring in segments with indices w G W{e) are defined as 



segment{te,Urec,w)\u 



/3(a 



s) = < 



(a) 3u G pos{tG,x) with 

u G pos(segment(te, t/rec "w^)) 
and 

(b) segment{te,Urec,w) yf T 
otherwise. 



With P{x,w,e) =± (3{x' ,w' ,e') we denote that two instantiations are either 
identical or that one of the instantiations is undefined. 



Because we allow for incomplete imfoldings, it can happen that in some 
segments the positions of a hypothetical variable does not exist. Instantiation 
/3 is T (i. e., undefined), if the searched for position is non-existent in all 
segments. 

Inducing the substitutions is based on the same principle as inducing the 
subprogram body, that is, to detect regularities in the yet unexplained sub- 
trees of the segments. We realize the construction of a hypothesis for variable 
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substitutions in the recursive calls of a subprogram by identifying instantia- 
tion and substitution through comparison of two successive segments {w and 
w o r) and validate this hypothesis for all other pairs of successive segments 
{w' and w' or). For validation there must exist at least two additional succes- 
sive segments where the position of the substitution term is given. Therefore, 
it must hold that the initial trees in Tinu must provide at least four segments 
for constructing and validating the searched for substitution of a variable. 
But it is not necessary for these further segments to be given in the same 
initial tree, it is enough that this information can be collected over all trees in 
Tinit^- Thereby we keep the restrictions on initial trees minimal. Furthermore, 
for calculating interdependent substitutions (see table 7.10), it is necessary, 
that for each substitution of a variable Xj in a given segment, the substitution 
terms of all variables are defined in the preceding segment. 

Lemma 7.3.3 (Necessary Condition for Substitutions). Let Tinu C 
'^su{s7} ® of initial trees indexed over E and to the hypothesis of the 

program body which follows from segmentation {UrecUsub) with Usub = 0- 
Recursion points are indexed over i? = {1, . . . , \ Urec\\- The variables occuring 
in the program body are X = var(t(y). Recurrent hypotheses over substitutions 
can only be induced if for each variable Xj € X and each recursive call r G R 
holds: 3 ei ,62 G E and w\ € VF(ei), W 2 G W(e 2 ) with wi = X, (wi,ei) fy 
{w 2 , 62 ) and for two positions h G { 1 , 2 }.- 

J. /3(xj, w/i o r, e/i) ^ -L and 

2. Vxfe e X : /3{xk,Whor,eh) fy T. 

An algorithm for testing whether enough instances are given in a set of 
initial trees is given in table 7.13. 

Table 7.13. Testing the Existence of Sufficiently Many Instances for a Variable 

Function: enoughinstances : X bool 
Pre: E is the set of indices for Tinn 

R is the index set of recursion points 
W (e) ist the set of segment indices for te G Tinn 
X is the set of variables in tc and Xj G X 

/3{xk,w,e) returns the instantiation of variable xt G X for segment w G W{e) 
Post: true if there are sufficiently many instances, false otherwise 

enoughinstances ( 0 : 3 ) = 

1. (3ei e E : P{xj,r.X,ei) / T AND 

2. \/xkGX- P(xk, A, ei) / T) AND 

3. ( 3 e 2 G E : 3w G W ( 62 ) : I3{xj,w o r, 62 ) / T AND 

4. yxkGX-p(xk,w,e2)j^-L)AN-D 

5. (ei 62 OR w ^ X) 



After construction of a substitution hypothesis from a pair of succeeding 
substitutions and validating it for all (at least one further) pairs of substi- 
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tutions for which the according sub-term positions are defined in the initial 
trees, it must be checked whether the complete set of substitution terms is 
consistent. 

Lemma 7.3.4 (Consistency of Substitutions). Let Tinn be a set of ini- 
tial trees indexed over E. Let be P : X xW x E '7^u{_l} ^b,e set of instanti- 
ations of variables x € X in segments e € W(e),e G E. Let be sub(xj,r) the 
substitution terms of variables Xj G X in the recursive calls. The composition 
of substitutions sub* \ X x W Ts{X) is inductively defined as: 

{ Xj w = X 

sub(xj, r-){a;i <— sub*(xi, w), . . . , 

Xm sub*(xm, f)} w = V o r. 

For each initial tree tg G Tmit must exist an instantiation P(xj,X,e) such 
that for all variables Xj G X and all segments w G W{e) holds: P{xj, X, e) =± 
suh*{xj,w){xi ^ P{xi,X,e),...,Xm ^ P(xm,X,e)}. 

A substitution of a variable Xj is consistent, if the substitution for a given 
segment at position w can be constructed by successive application of the 
hypothetical substitution on the initial instantiation of the variable in the 
root segment {w = A). 

7. 3. 4. 3 Detecting Hidden Variables. Remember that the set of parame- 
ters of a recursive subprogram initially results from constructing the hypoth- 
esis of the program body by anti-unification of the hypothetical segments 
(see sect. 7.3.2). The maximal pattern of all segments contains variables at 
all positions where the subtrees differ over the segments. In many cases, the 
set of variables identified in this way is already the final set of parameters of 
the to be constructed recursive functions. Only in the case of hidden variables 
(see table 7.10), that is, variables which only occur in substitution terms, the 
set of variables must be extended. 

In contrast to the standard case, the instantiation of such hidden vari- 
ables cannot be identified in the segments, they must be identified within 
the substitution terms instead. This can be done because, if such a variable 
occurs in a recursive equation it must be used in parameter substitutions in 
the recursive call. 

Lemma 7.3.5 (Identification of Hidden Variables). Let S = (G,to) be 
an RPS, Gi{xi , . . . , XmJ =U a subprogram in S, and Xi = {xi , . . . , XmA the 
set of parameters of Gi. Let Urec be the set of recursion points of Gi indexed 
over R = {1, . . . , |t/rec|}; kb the set of unfolding indices constructed over R, 
and Ti the set of all unfoldings of Gi with instantiations /3 : The 

program body ti is the maximal pattern of all unfoldings in Ti. 

Let = var{ti[Urec G]) be the set of variables occuring in the pro- 
gram body. Let X^ = Xi\X^ be the set of hidden variables of subprogram Gi. 
Let suh{xj,r) G Ts{Xi) be a substitution term with var(sub(a:j, r)) n A/^ 
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0. Let Xh G var(sub(sj, r))nX/^ be a hidden variable used in the substitution 
term at position Uh G pos(sub(xj, r), x/i). It holds: 

1. With \/w W . Pwor(^^j^\uh, — • 

2. -Nwi,W 2 e w : node(/d^j^or(xj),Uh) = node(/d^^or(xj), Uh). 

3. -<3u < Uh and ~Bxk G Xi with \/w € W : /3wor(xj)ju = /3w(xk)- 

Proof (Identification of Hidden Variables). 

1. Suppose there exists a variable Xk £ for which holds condition 1 in lemma 
7.3.5, then it must hold that Xk Xh because Xk £ Xfi, Xh £ Xf^ , and 
X/* U Xf^ = 0; and it must hold for all w € IP that 

- P-wor{xj)\uh = Piv{xk) (given postulation) 

— fituor{xj)\uf, = P-w{xh) (follows froui supposition) 

Consequently, for all w G W must hold fiwixk) = fiw{xh) which contradicts 
the assumption that ti is a maximal pattern of all unfoldings over /3. 

2. For all ic G W holds that f3wor{xj)\uf, = P-w{xh). Because U is a maxi- 
mal pattern, variables do not have a common non-trivial pattern (a com- 
mon prefix). Therefore, there must exist two positions wi,W 2 £ W such that 
node(/?„joj'( 2 ;h). A) 7 ^= node(/ 3 u, 2 or(a:/i). A) and therefore also 

node{p.u,i or {xj),Uv) / node(/?™ 2 or(a:j), Uv). 

3. Follows from Lemma B.3.1 (appendix BB.3). 

From lemma 7.3.5 follows that for each substitution term of a variable can 
be decided whether it is constructed using an function from E or a hidden 
variable. The initial instantiations of such hidden variables can be induced 
from the substitution terms. An algorithm for determining hidden variables 
is given in table 7.14. 



Table 7.14. Determining Hidden Variables 

Function: hiddenVar : pos{tinit) x X x A — > X' 

Pre: u is a position in P{xj,w, e) 

X = {®i, . . . , Xm} is the set of variables 

P{xk,w,e) returns the instantiation of variable Xk & X for segment w G W (e) 
R is the index set of recursion points 
Post: new variable symbol Xm+i 

Side-effect: X is extended by Xm+i', P is extended by the values of Xm+i', Break (and 
backtracking to calculating segmentations) if there are not enough instances of 

^771 + 1 

1. m = |X| 

2. Generate new variable symbol Xm+i ^ X 

3. X = XU { 3 :^+ 1 } 

4. FORALL eGE DO 

5. FORALL wGWDO 

6 . IF w o r G W{e) 

7. THEN P{xm+i,w,e) = P{xj,w o r,e)\u 

8. IF NOT(enoughInstances(a:m-)-i)) (see table 7.13) 

9. THEN break 
10. return Xm+i 
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7. 3. 4. 4 Algorithm. The core of the algorithm for calculating substitution 
terms can be obtained by extending definition 7.3.10 and is given in ta- 
ble 7.15. The algorithm makes use of functions uniqueness (table 7.11), 
betaRecurrent (table 7.12), and hiddenVar (table 7.14). 



Table 7.15. Calculating Substitution Terms of a Variable in a Recursive Call 

Function: calcSub ■. X x R x pos{tinit) — > Ts 
Pre: E is the set of indices for Tinit 

R is the index set of recursion points with r £ R 
W (e) ist the set of segment indices for te € Tinu 
X is the set of variables in to and Xj ,Xk € X 

j3(xk,w,e) returns the instantiation of variable Xk ^ X for segment w £ W{e) 
Post: a valid substitution term sub(xj, r) for variable Xj £ X in the r — th recursive 
call. 

Side-effect: Break caused if a variable is not uniquely determined (caused by 
uniqueness) or if there are not enough instances for calculating the instanti- 
ation of a hidden variable (caused by hidden). 

calcSub(a;j, r, u) = 

1. IF betaRecurrent(a:j, a:fc, r, u) 

2. AND (Let = P{xj,w o r, e) for e £ E , w £ W (e) and fi{xj,w o r, e) / T In 
uniqueness(a:j, Xi;, u, tp)) 

3. THEN Xk 

4. ELSE YP 'ie £ E,w £W : noAe{l3{xj.w o r, e), u) = f with a{f) = n 

5. THEN /(calcSnb(a:j,r, u o 1), . . . , calcSnb(a:j, r,uo n)) 

6. ELSE hidden(u, Xj, r) 



Integrating all components for calculating substitutions introduced in this 
section, we obtain the following algorithm: 

1. Instantiate X = var{tc) and calculate the initial instantiation as defined 
in 7.3.11. 

2. Determine whether there are enough succeeding substitution terms as 
proposed in lemma 7.3.3. If there are not enough terms, break and back- 
track to calculating a new segmentation. 

3. For each variable Xj £ X and each recursive call r £ R calculate the 
substitutions, starting at it = A using algorithm 7.15. 

4. Check consistency of instantiations as proposed in lemma 7.3.4. If an 
inconsistency is detected, break and backtrack to calculating a new seg- 
mentation. 

5. If \Tinit\ > 2 then check whether there exist at least two initial trees 
in Tinit with complete initial instantiations as proposed in 7.3.2 in sec- 
tion 7.3.3. If this is not the case, break and backtrack to calculating a 
segmentation. 

6. Initially, instantiate to = to- Instantiate m = \X\. Calculate indices for 
the variables in X using {!,..., m}. For each Ur £ Urec extend to in the 
following way: to = to[ur <— G(sub(a:i, r), . . . sub(xm, "c))]. 
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7. Return G(xi,...Xm) = tc and for all initial trees tg return /3e = 
P{xi, A, e) for all Xi G X. 

7.3.5 Constructing an RPS 

7. 3. 5.1 Parameter Instantiations for the Main Program. When con- 
structing an RPS 5(0, to) which recursively explains a set of initial trees 
Tinit, for each G Tinzt there must be calculated the initial instantiation of 
variables (3^ in This can be done by using slightly modified versions of def- 
initions 7.3.8 (instantiations of variables in a hypothetical subprogram body, 
section 7.3.2) and 7.3.11 (instantiations of variables in a segment, defined for 
parameter instantiations in recursive calls, section 7.3.4). 

For a recursive program scheme S{Q, to) each initial tree G Tinu implies 
an instantiation (3e of variables in Iq. 

Definition 7.3.12 (Instantiation of to)- Let Tinu C iTiulr?} ® of 
initial trees indexed over E and 5(0, to) the RPS which recursively explains 
Tinit ■ Let tg he the main program for t^ G Tinit ■ The instantiations of to for 
a tree tg can be constructed as 



If there exists a recursive subprogram which explains all trees in Tinit , then 
the variable instantiations in the main program which calls this subprograms 
can be calculated with respect to the subprogram. 

Definition 7.3.13 (Instantiation of to wrt a Subprogram). Let Tinit C 
L'su{n} be a set of initial trees indexed over E and G{x \, . . . , Xm) = to be a 
subprogram which recursively explains the trees tg G Tinit together with vari- 
able instantiations (3^ : {xi, . . . ,Xm} — > U {-L}. The instantiations of to 

for a tree tg can be constructed as 



7. 3. 5. 2 Putting All Parts Together. Now we have all components for a 
system for inducing recursive explanations from a set of initial trees. The core 
of the system is to find a set of recursive subprograms. An overview of the 
induction of a subprogram is given in figure 7.12. Program BuildSub involves 
the following steps: (1) finding a valid recurrent segmentation Urec for a set of 
initial trees (section 7.3.1), (2) constructing a hypothesis about the program 
body by calculating the maximal pattern of the segments (section 7.3.2), (3) 





with u G pos(tg) and tffu T 
T otherwise. 





u G pos(/3f (G(a:i , . . .,Xm))) and 
(b) /3f {G{xi , . . . , Xm))\u ^ T 



T otherwise. 
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interpreting the remaining subtrees as variable instantiations, (4) determin- 
ing the variable substitutions and the initial instantiation of the variables 
(section 7.3.4). If there are subtrees which are not covered by the current 
segmentation, these trees are considered to belong to further subprograms 
and the process of inducing a (sub-) RPS is called recursively (see BuilRPS 
in figure 7.13). Whenever one of the integrity conditions formulated in the 
sections above is violated, the algorithm backtracks to step one. 



BuildSub(Ti„it): 




Fig. 7.12. Steps for Calculating a Subprogram 



The complete procedure for inducing an RPS is given in figure 7.13. Start- 
ing at the roots of the initial trees in Tinu, the system searches for a sub- 
program together with a calling main program and an initial instantiation 
of variables. Possibly, the searched for “top-level” subprogram calls further 
subprograms which are also constructed {BuildSub, figure 7.12). If no solu- 
tion is found, the root node is interpreted as a constant function in the main 
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program and BuildRPS starts with all subtrees of the root node as new set 
of initial trees. If none of the new initial trees contains an f?, no recursive 
generalization is constructed and there is only a main program which is the 
maximal pattern of all trees, otherwise, BuildRPS is called recursively for all 
sets of initial trees at fixed positions under the root. 

The used strategy is to find a subprogram which explains as large a part 
of the initial trees as possible at a level in the trees as high as possible. Re- 
member, that, if only one initial tree is given, it is not possible to determine 
the parameters of main program tg, otherwise, the parameters of main are 
calculated by anti-unification of the main programs of all initial trees (see 
section 7.2.3). Remember further, that currently subprograms with different 
names are induced for each unexplained subtree of a to be induced subpro- 
gram. Subprograms with identical bodies can be identified after synthesis is 
completed (see section 7. 3. 3. 4 and section 7. 3. 5. 4). 

7. 3. 5. 3 Existence of an RPS. If an RPS can be induced from a set of 
initial trees using the approach presented here, then induction can be seen 
as a proof of existence of a recursive explanation - given the restrictions 
presented in section 7.2.2. 

Theorem 7.3.5 (Existence of an RPS). LetTinu be a set of initial trees 
indexed over E. Tinu can be explained recursively by an RPS iff: 

1- Tinit can be recursively explained by a subprogram G{x\, . . . ,Xm) = to, 
or 

2. Ve G E ■. 3f G E with a(f) = n, n > 0, and node(te,A) = f, and 
yT'^ = {te\k.x \k=l,...,n} holds: 

a) yt gT^ pos(t, Q) = 0, or 

b) yt G pos(t, f?) ^ 0 and it exists an RPS which 

recursively explains the trees in T^. 

Proof (Existence of an RPS). It must be shown (1) that an RPS exists, if proposi- 
tions (1) and (2) in theorem 7.3.5 hold, and (2) that no RPS exists, if propositions 
( 1 ) and ( 2 ) do not hold. 

1. Existence of an RPS, if propositions (1) and (2) in theorem 7.3.5 hold: 

Proposition 1: Let G(xi , . . . , Xm) = tc be a subprogram which recursively ex- 
plains all trees in Tina (def. 7.1.30). Then there exist /?? : { 2 : 1 , • • ■ , ®m} ^ 
Te U {_L} such that G with instantiations recursively explains te € 
Tinit. Let to be the maximal pattern of all terms ft'y {G{x\, . . . ,Xm)) 
(theorem 7.3.1) and : var(to) — > U {_L} (def. 7.3.13) the ini- 
tial instantiations of variables in to for tree te G Tinit . Then the RPS 
S{{G{xi, . . . , Xm) = to), to) together with instantiations /3e recursively ex- 
plains the trees in Tnit- 

Proposition 2: For all T^ , k = l,...,n, with pos(t, 17) 7 ! emptyset by as- 
sumption must exist an RPS 5*^ = Bq) which recursively explains 
trees and there must exist instantiations de ■ var(to) — > Te U {T} for 
parameters in the main programs to. For each term tg it must hold that 
node(tg, A) = / with arity a(/) = n. For all fc = 1, . . . , n must hold: 
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BuildRPS(Ti„it): 




Fig. 7.13. Overview of Inducing an RPS 



foU.A 



/3e(fo) pos(f, 
ielfc.A pos(f, 17) = 0. 



The “global” main to then is the maximal pattern of all fg with variable 
instantiations Ps{x) (def. 7.3.12). Q is the set of all subprograms of the 
sub-schemata and the RPS S{Q,to) together with instantiations /?e 
recursively explains all trees in Tinu. 
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2. Non-Existence of an RPS, if propositions (1) and (2) in theorem 7.3.5 do not 
hold: 

If proposition (1) do not hold, then there cannot exist an RPS with a main 
program to which just calls a recursive subprogram. That is, there must exist 
an RPS with a larger main program for which proposition (2) does not hold. 
If there does not exist a symbol f € S which is identical for all roots of the 
initial trees ts G Tinu, then there cannot exist a (first-order) main program 
which explains all initial trees. 

If there exist ti G with pos(ti, fl) — % and t 2 G with pos(t 2 , O) yf 0, 
then on the one hand, cannot be explained by an RPS (by assumption) 
because of ti, and, on the other hand, cannot be explained by a non- 
recursive term with variable instantiations because of t 2 - 

7. 3. 5. 4 Extension: Equality of Subprograms. After an RPS S{G,to) is 
induced, the set of subprograms Q can possibly be reduced by unifying sub- 
programs with equivalent bodies and different names Gi G 1?.® In the most 
simple case, the bodies of the two subprograms are identical except names of 
variables. In general, Gi and Gj can contain calls of further subprograms. Gi 
and Gj are equivalent, if these subprograms are equivalent, too. Table 7.16 
gives an example, where Ga calls a further subprogram Gb and where vari- 
ables of Ga are partially instantiated - with different values for two different 
positions. Subprogram Ga could be generated from the two partially instan- 
tiated subprograms by (a) determining whether the further subprograms are 
equivalent and by (b) constructing the maximal pattern of instances of Ga- 
But note, that a strategy for deciding when different subprograms should be 
identified as a single subprogram would be needed to realize this simplifica- 
tion of program schemes! 



Table 7.16. Equality of Subprograms 

Ga{xi,X 2 , Xs) = if{eql{xi), Gb{X 2 ), Ga{-{xi,X3), X 2 , xs)) 
Induced subprograms: 

Ga is called with X2 = 1 and X3 = 2: 

Gai{xi) = if{eql{xi), Gsi(l), Gai(-(3:i, 2))) 

Ga is called with X 2 = 2 and 3:3 = 3: 

Ga 2 {xi) = if{eql{xi),GB2{2),GA2{-{xi,‘i))) 



7. 3. 5. 5 Fault Tolerance. The current, purely syntactical approach to fold- 
ing does allow for initial trees where some of the segments of a hypothetical 
unfolding are incomplete. Incompleteness could for example result from not 
considered cases when the initial trees were generated (by a user or another 
system). But, it does not allow for initial trees which contain defects, for 
example wrong function symbols. Such defects could for example result from 
user generated initial trees or traces, where it might easily happen that trees 
are unintentionally flawed (by typing errors etc.). A possible solution for this 

® This is a possible extension of the system, not implemented yet. 



220 7. Folding of Finite Program Terms 



problem could be, that not only an RPS is induced but that additionally, min- 
imal transformations of the given initial trees are calculated and proposed to 
the user. This problem is open for further research. 



7.4 Example Problems 

To illustrate the presented approach to folding finite programs we present 
some examples. First, we give time measures for some standard problems. 
Afterwards we demonstrate application to planning problems which we in- 
troduced in chapter 3. 

7.4.1 Time Effort of Folding 

Figures 7.14 and 7.15 give the time effort for unfolding and folding the facto- 
rial (see tab. 7.7) and the Fibonacci function (see tab. 7.3). The experiments 
were done by unfolding a given RPS to a certain depth and then folding 
the resulting finite program term again. In all experiments the original RPS 
could be inferred from its n-th unfolding (n = 3, . . . , 10). The procedure for 
obtaining the time measures is described in appendix AA.6. 

Folding has a time effort which is a factor between 3 and 4 higher than 
unfolding. For linear recursive functions, folding time is linear and for tree 
recursive functions, folding time is exponential in the number of unfolding 
points, as should be expected. 




Unfolding-depth n = 3, . . . , 10 (x-axis) measured in seconds (y-axis) 
Fig. 7.14. Time Effort for Unfolding/Folding Factorial 



The greatest amount of folding time is used for constructing the substitu- 
tions and the second greatest amount is used for calculating a valid recurrent 
segmentation. Figure 7.16 gives these times for the factorial function. 
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Unfolding-depth n = 3, . . . , 10 (x-axis) measured in seconds (y-axis) 
Fig. 7.15. Time Effort for Unfolding/Folding Fibonacci 




Unfolding-depth n = 3, . . . , 10 (a;-axis) measured in seconds (j/-axis) 

Fig. 7.16. Time Effort Calculating Valid Recurrent Segmentations and Substitu- 
tions for Factorial 



Both for factorial and for Fibonacci the first segmentation hypothesis 
leads to success, that is, no backtracking is needed. For a variant of factorial 
with a constant initial part (see table 7.17), one backtrack is needed. For an 
unfolding depth n = 3 time for folding goes up to 0.36 sec (from 0.12 sec) 
and for n = 4 to 0.44 sec (from 0.16 sec). 

For the ModList function (see tab. 7.2) an RPS with two subprograms 
must be inferred. For the third unfolding of ModList where Mod is unfolded 
once in the first, twice in the second, and three times in the third unfolding 
is 1.07 sec (with 0.16 sec for unfolding). For the fourth unfolding with one to 
four imfoldings of Mod, folding time is 3.20 sec (with 0.33 sec for unfolding). 
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Table 7.17. RPS for Factorial with Constant Expression in Main 

Q = (G(x) = if{eqO{pred{x)),l,*{x,G{pred{x))))) 
to = if{eqO{x), 1, G{x)) 



7.4.2 Recursive Control Rules 

In the following we present how RPSs for planning problems such as the ones 
presented in chapter 3 can be induced. In this section we do not describe 
how appropriate initial trees can be generated from universal plans but just 
demonstrate that the recursive rules underlying a given planning domain can 
be induced if appropriate initial trees are given! 

For the function symbols used in the signature of the RPSs we assume 
that they are predefined. Typically such symbols refer to predicates and op- 
erators defined in the planning domain. Furthermore, for planning problems, 
a situation variable s is introduced which represents the current planning 
state. A state can be presented as list of literals - for Strips-like planning 
(see chap. 2) -- or as list or array over primitive types such as numbers - for 
a functional extension of Strips (see chap. 4). Generating initial trees from 
universal plans is discussed in detail in chapter 8. 

The clearblock problem introduced in section 3. 1.4.1 in chapter 3 has an 
underlying linear recursive structure. In figure 7.17 an initial tree for a three 
block problem is given (the ontahle literal is omitted for better readability). 
The resulting RPS is: 

G = (G(x,s) = if [clear {x,s),s,puttable{topof{x,s),G{tovof{x,s),s)))) 
to = G[G, list[dear[A),on[A, B),on[B, C) , ontable[C))) . 

The first variable x holds the name of the block to be cleared, for example 
C. The situation variable s holds a description of a planning state as list of 
literals. Predicate cZear, selector topof, and operator puttable are assumed to 
be predefined functions. Clearfx, s) is true if clear[x) € s; topoffx, s) returns 
y for on[y,x) G s; puttablefx, s) deletes on(x, y) from s and adds the literals 
clear(y) and ontable(x) to s. 

The Tower of Hanoi domain is an example for a recursive function relying 
on a hidden variable (see sect. 7. 3. 4. 3). In figure 7.18, an initial tree is given. 
The underlying RPS is: 



G = { G[n, X, y, z) = if{eql{n),move{x, y), 

ap[hanoi{pred[n),x, z, y),move[x, y), hanoi[pred[n) , z, y, x)))) 

to = G(3, a, b, c). 

The function symbol ap (which can be interpreted as append) is used to 
combine two calls of Hanoi with a move operation. 
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Fig. 7.17. Initial Tree for Clearblock 
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Fig. 7.18. Initial Tree for Tower of Hanoi 
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Parameter 2: does occur only in substitutions in the recursive calls. There- 
fore, the maximal pattern contains only three variables: 

tG = if{ eql{xi), 

move{x2,xz), 
ap{G, move{x2,X3), G)) 

and the fourth variable z is included after determining the substitution terms. 

In the following chapter, recursive generalizations for these and further 
planning problems will be discussed. 



8. Transforming Plans into Finite Programs 



”[■■■] I guess I can put two and two together. ” ’’Sometimes the answer’s four, ” 
I said, ’’and sometimes it’s twenty-two. [...]” 

— Nick Charles to Morelli in: Dashiell Hammett, The Thin Man, 1932 



Now we are ready to combine universal planning as described in chapter 
3 and induction of recursive program schemes as described in chapter 7. 
In this chapter, we introduce an approach to transform plans generated by 
universal planning into finite programs which are used as input to our folder. 
On the one hand, we present an alternative approach to realizing the first 
step of inductive program synthesis as described in section 6.3.4 in chapter 
6 - using AI planning as basis for generating program traces. On the other 
hand, we demonstrate that inductive program synthesis can be applied to 
the generation of recursive control rules for planning - as discussed in section 
2.5.2 in chapter 2. As a reminder to our overall goal introduced in chapter 1, 
in figure 8.1 an overview of learning recursive rules from some initial planning 
experience is given. 



DOMAIN EXPLORATION 




CONTROL KNOWLEDGE LEARNING 
Fig. 8.1. Induction of Recursive Functions from Plans 



U. Schmid: Inductive Synthesis of Functional Programs, LNAI 2654, pp. 227-269, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 
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While plan construction and folding are straight-forward and can be 
dealt with by domain-independent, generic algorithms, plan transformation 
is knowledge dependent and therefore the bottleneck of our approach. What 
we present here is work in progress. As a consequence, the algorithms pre- 
sented in this chapter are stated rather informally or given as part of the Lisp 
code of the current implementation and we rely heavily on examples. We can 
demonstrate that our basic idea - transformation based on data type infer- 
ence - works for a variety of domains and hope to elaborate and formalize 
plan transformation in the near future. 

In the following, we first give an overview of our approach (sect. 8.1), then 
we introduce decomposition, type inference, and introduction of situation 
variables as components of plan transformation (sect. 8.2). In the remaining 
sections we give examples for plans over totally ordered sequences (sect. 8.3), 
sets (sect. 8.4), partially ordered lists (sect. 8.5), and more complex types 
(sect. 8.6).^ 



8.1 Overview of Plan Transformation 

8.1.1 Universal Plans 

Remember construction of universal plans for a small finite set of states 
and a fixed goal, as introduced in chapter 3. A universal plan is a DAG 
which represents an optimal, totally ordered sequence of actions (operations) 
to transform each input state considered in the plan into the desired output 
(goal) state. Each state is a node in the plan and is typically represented as 
set of atoms over domain objects (constants). Actions in a plan are applied to 
the current state in working memory. For example, for a given state {on(A, 
B), on(B, C), clear(A), ontafefe (^(7)}, application of action putta We results 
vci {on(B, C), clear(A), clear(B), ontahle(A), ontable(C)}. 

8.1.2 Introducing Data Types and Situation Variables 

To transform a plan into a finite program, it is necessary to introduce an 
order over the domain (which can be gained by introducing a data type) 
and to introduce a variable which holds the current state (i. e., a situation 
variable). 

Data type inference is crucial for plan transformation: The universal plan 
already represents the structure of the searched-for program, but it does not 
contain information about the order of the objects of its domain. Typically for 

^ This chapter is based on the previous publications Schmid and Wysotzki (2000b) 
and Schmid and Wysotzki (2000a). An approach which shares some aspects 
with the plan transformation presented here and is based on the assumption of 
linearizability of goals is presented in Wysotzki and Schmid (2001). 
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planning, domain objects are represented as sets of constants. For example, 
in the clearblock problem (sect. 3. 1.4.1 in chap. 3) we have blocks A, B, and 
C. As discussed in chapter 6, approaches to deductive as well as inductive 
functional program synthesis rely on a “constructive” representation of the 
domain. For example. Manna and Waldinger (1987) introduce the hat-axiom 
for refering to a block which is lying on another block; Summers (1977) relies 
on a predefined complete partial order over lists. Our aim is, to infer the data 
structure for a given planning domain. If the data type is known, constant 
objects in the plan can be replaced by constructive expressions. 

Furthermore, to transform a plan into a program term, a situation variable 
(see sect. 2.2.2 in chap. 2 and sect. 6. 2. 1.2 in chap. 6) must be introduced. 
While a planning algorithm applies an instantiated operator to the current 
state in working memory (that is, a “global” variable), interpretation of a 
functional program depends only on the instantiated parameters (that is, 
“local” variables) of this program term (Field & Harrison, 1988). Introducing 
a situation variable in the plan makes it possible to treat a state as valuated 
parameter of an expression. That is, the primitive operators are now applied 
to the set of literals held by the situation variable. 

8.1.3 Components of Plan Transformation 

Overall, plan transformation consists of three steps, which we will describe 
in the following section: 

Plan Decomposition: If a universal plan consists of parts with different sets 
of action names (e. g., a part where only unload(x, y) is used and another 
part, where only load(x, y) is used), the plan is splitted into sub-plans. 
In this case, the following transformation steps are performed for each 
sub-plan separately and a term giving the structure of function-calls is 
generated from the decomposition structure. 

Data Type Inference: The ordering underlying the objects involved in action 
execution is generated from the structure of the plan. From this order, 
the data type of the domain is inferred. 

Introduction of Situation Variables: The plan is re-interpreted in situation 
calculus and rewritten as nested conditional expression. 

For information about the global data structures and central components 
of the algorithm, see appendix AA.7. 

8.1.4 Plans as Programs 

As stated above, a universal plan represents the optimal transformation se- 
quences for each state of the finite problem domain for which the plan was 
constructed. To execute such a plan, the current input state can be searched 
in the planning graph by depth-first or breadth-first search starting from the 
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root and the actions along the edges from the current input state to the 
root can be extracted (Schmid, 1999). If the universal plan is transformed 
into a finite program term, this program can be evaluated by a functional 
eval-apply interpreter. For each input state, the resulting transformation se- 
quence should correspond exactly to the sequence of actions associated with 
that state in the universal plan. 

The searched-for finite program is already implicitely given in the plan 
and we have to extract it by plan transformation. A plan can be considered 
as a finite program for transforming a fixed set of inputs into the desired 
output by means of applying a total ordered sequence of actions to an input 
state, resulting in a state fulfilling the top-level goals. A plan constructed by 
backward search with the state(s) fulfilling the top-level goals as root, can be 
read top-down as: IF the literals at the current node are true in a situation 
THEN you are done after executing the actions on the path from the current 
node to the root ELSE go to the child node(s) and recur f Our goal is to ex- 
tract the underlying program structure from the plan. To interpret the plan 
as a (functional) program term, states are re-interpreted as boolean opera- 
tors: All literals of a state description which are involved in transformation 
- the “footprint” (Veloso, 1994) - are rewritten with the predicate symbol 
as boolean operator introducing a situation variable as additional argument. 
Each boolean operator is rewritten as a function which returns true, if some 
proposition holds in the current situation, and false otherwise. Additionally, 
the actions are extended by a situation variable, thus, the current (partial) 
description of a situation can be passed through the transformations. Fi- 
nally, to represent the conditional statement given above, additional nodes 
only containing the situation variable are introduced for all cases, where the 
current situation fulfills a boolean condition. In this case, the current value 
of the situation variable is returned. 

We define plans as programs in the following way: 

Definition 8.1.1 (Plan as Program). Each node S (set of literals) in 
plan 0 is interpreted as conjunction B of boolean expressions. The planning 
tree can now be interpreted as nested conditional: 0{s) = IE B{s) THEN t\ 
ELSE t 2 with ti,t 2 == s \ o(0'(s)), where s is a situation variable, o the 
action given at the edge from B to a child node, and O' as sub-plan with this 
child node as root. 

The restriction to binary conditions “if-then-else” is no limitation in expres- 
siveness. Each n-ary condition can be rewritten as nested binary condition: 
(cond (xi ti) (X2 t2) (xz tz) ... (Xn tn)) = = 

(ifxi ti (ifX2 t2 (if Xz tz (if ... (if Xn tn li))))). 

In general, boolean expressions B(s) can involve propositions explicitely 
given in a situation as well as additional constraints as described in chapter 

^ An interpreter function for universal plans is given in (Wysotzki & Schmid, 
2001 ). 
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4. If the plan results in a term IF B{s) THEN s ELSE o{0{s)), the problem 
is linear, if then- and else- part involve operator application, the problem is 
more complex (resulting in a tree recursion). We will see below, that for some 
problems a complex structure can be collapsed into a linear one as a result 
of data type introduction. 

8.1.5 Completeness and Correctness 

Starting point for plan transformation is a complete and correct universal 
plan (see chap. 3). The result of plan transformation is a finite program 
term. The completeness and correctness of this program can be checked by 
using each state in the original plan as input to the finite program and check 
(1) whether the interpretation of the program results in the goal state and (2) 
whether the number of operator applications corresponds to the number of 
edges on the path from the given input state to the root of the plan. Of course, 
because folding is an inductive step, we cannot guarantee the completeness 
and correctness of the inferred recursive program. The recursive program 
could be empirically validated by presenting arbitrary input states of the 
domain. For example, if a plan for sorting lists of up to three elements was 
generalized into a recursive sorting program, the user could test this program 
by presenting lists with for or more elements as input and inspecting the 
resulting output. 



8.2 Transformation and Type Inference 

8.2.1 Plan Decomposition 

As an initial step, the plan might be decomposed in uniform sub-plans: 

Definition 8.2.1 (Uniform Sub-Plan). A sub-plan is uniform if it con- 
tains only fixed, regular sequences of operator-names (oi . . . On) with n > 1. 

The most simple uniform sub-plan is a sequence of steps where each ac- 
tion involves an identical operator-name (see fig. 8. 2. a). An example for such 
a plan is the clearblock plan given in figure 3.2 which consists of a linear se- 
quence of puttable actions. It could also be possible that some operators are 
applied in a regular way - for example drill-hol^ polish- object (see fig. 8.2.b). 
Single operators or regular sequences can alternatively occur in more complex 
planning structures (see fig. 8.2.c). An example for such a plan is the sorting 
plan given in figure 3.7 which is a DAG where each arc is labelled with a 
swap action. 

A plan can contain uniform sub-plans in several ways: The most simple 
way is, that the plan can be decomposed level-wise (see fig. 8. 3. a). This is, for 
example, the case for the rocket domain (see plan in fig. 3.5): The first levels 
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of the DAG only contain unload actions, followed by a single move-rocket 
action, followed by some levels which only contain load actions. In general, 
sub-plans can occur at any position in the planning structure as subgraphs 
(see fig. 8.3.b). 



(a) C 

1 


□ 

(o1 ...) 


(b) C 


(o1 ...) 


(c) 

(o1 ...)/ 

I ^ 


\^1 


1 


1 

(o1 ...) 




(02 ...) 


(ol/..) (\l...) 




L 


□ 

(o1 ...) 




(o1 ...) 






C 


(o1 ...) 

□ 


p 


(02 ...) 










(o1 ...) 

J 










(02 ...) 

J 







Fig. 8.2. Examples of Uniform Sub-Plans 



We have only implemented a very restricted mode for this initial plan 
decomposition: single operators and level-wise splitting (see appendix AA.8). 
A full implementation of decomposition involves complex pattern-matching, 
as it is realized for identifying sub-programs in our folder (chap. 7). Level- 
wise decomposition can result in a set of “parallel” sub-plans which might be 
composed again during later planning steps. Parallel sub-plans occur, if the 
“parent” sub-plan is a tree, that is, it terminates with more than one leaf. 
Each leaf becomes the root of a potential subsequent sub-plan. 

In the current implementation, we return the complete plan if different 
operators occur at the same level. A reasonable minimal extension (still avoid- 
ing complex pattern-matching as described in chap. 7) would be to search for 
sub-plans fulfilling our simple splitting criterium at lower levels of the plan. 
But up to now, this case only occurred for complex list problems (such as 
tower with 4 blocks, see below) and in such cases, a minimal spanning tree 
is extracted from the plan. 
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Fig. 8.3. Uniform Plans as Subgraphs 



If decomposition results in more than one sub-plan, an initial skeleton 
for the program structure is generated over the (automatically generated) 
names of the sub-plans (see also chap. 7), which are initially associated with 
the partial plans and finally with recursive functions. If folding succeeded, 
the names are extended by the associated lists of parameters. For example, 
a structure (pi (p2 (p3))) could be completed to (pi argl (p2 arg2 arg3 (p3 
( arg ^ s)))) where the last argument of each sub-program with name pi is a 
situation variable. For all arguments argi, the initial values as given in the 
finite program are known - as a result from folding where parameters and 
their initial instantiations are identified in an initial program (chap. 7). 

8.2.2 Data Type Inference 

The central step of plan transformation is data type inference.^ The structure 
of a (sub-) plan is used to generate an hypothesis about the underlying data 
type. This hypothesis invokes certain - data type specific - concepts which 
subsequently are tried to identify in the plan and certain rewrite-steps which 
are to be performed on the plan. If the data type specific concepts cannot be 
identified from the plan, plan transformation fails. 

Definition 8.2.2 (Data Type). A data type t is a collection of data items, 
with designated basic items T (together with a “bottom” -test) and operations 

Concrete and abstract data types are for example introduced in (Ehrig & Mahr, 
1985). 
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(constructors) such that all data items can be generated from basic items by 
operation-application. The constructor function determines the structure of 
the data type with _L < c(x, _L) < c(x', c(x, 1.)) .... If x is empty or unique, the 
data type is simple and the (countable, infinite) set of elements belonging to r 
is totally ordered. The structure of data belonging to complex types is usually a 
partial order. For complex types, additionally selector functions el{c{x,s)) = 
X and rs(c(x,s)) = s are defined. 

An example for a data type is list with the empty list as bottom element, 
cons(x, 1) as list-constructor, head(l) as selector for the first element of a list, 
and tail(l) as selector for the “rest” of the list, that is, the list without the 
first element. 

The data type hypotheses are checked against the plan ordered by in- 
creasing complexity: 

~ Is the plan a sequence of steps (no branching in the plan)? 

Hypothesis: Data Type is Sequence 

— Does the plan consist of paths with identical sets of actions? 

Hypothesis: Data Type is Set 

— Is the plan a tree? 

Hypothesis: Data Type is List or compound type 

— Is the plan a DAG? 

Hypothesis: Data Type is List or compound type. 

Data type inference for the different plan structures is discussed in detail 
below. After the plan is rewritten in accordance to the data type, the or- 
der of the operator-applications and the order over the domain objects are 
represented explicitly. 

Explicit order over the domain objects is achieved by replacing object 
names by functional expressions (selector functions) referring to objects in 
an indirect way. Referring to objects by functions f{f), where t is a ground 
term (a constant, such as the the bottom-element, or a functional expression 
over a constant) makes it possible to deal with infinite domains while still 
using finite, compact representations (Geffner, 2000). For example, (pick oset) 
can represent a specific object in an object list of arbitrary length. 

8.2.3 Introducing Situation Variables 

In the final step of plan transformation, the remaining literals of each state 
and the actions are extended by situation variable s as additional argument 
and the plan is rewritten as an conditioned expression as defined in definition 
8.1.1. An abbreviated version of the rewriting-algorithm is given in appendix 
AA.9.4 



This final step is currently only implemented for linearized plans. 
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8.3 Plans over Sequences of Objects 

A plan which consists of a sequence of actions (without branching) is assumed 
to deal with a sequence of objects. For sequences^ there must exist a single 
bottom element, which is identifiable from the top-level goal(s) together with 
the goal-predicate (s) as bottom-test. The total order over domain objects is 
defined over the arguments of the actions from the top (root) of the sequence 
to the leaf. 

Definition 8.3.1 (Sequence). Data type sequence is defined as: 
seq = _L I c{seq) with 



For a plan - or sub-plan - with hypothesized data type sequence, data 
type introduction works as described in the algorithm given in table 8.1. 



Table 8.1. Introducing Sequence 

— If the plan starts at level 0: 

— If the plan is not a single step: 

• If there is a single top-level goal {g ai ... an) set bottom-test to p, else fail. 

• Set type to seq. 

• Generate the sequence: 

Collect the argument-tuple of each action along the path from the root to 
the leaf. 

If the tuple consists of a single argument ai, keep it 

otherwise, remove all arguments ai which are constant over the sequence. 

• If the sequence consists of single elements and if each element occurs as 
argument of g, 

proceed with sequence = (eo ... Cm) and set bottom to eo 
else fail. 

• Construct an association list ((ei(succeo)) . . . (em(succ’^~^eo))). 

For (eo,ei) check, whether the state on level 1 contains a predicate q{args) 
with eo,ei £ args at positions (pos eg 1 q) and (pos ei 1 q) if yes, proceed, 
else fail. 

For each (ei, e^+i) of the sequence with i = 1, . . . , m check whether q{args) 
with 6i, 6i+i £ args exists at level i-|- 1 with {posei,i-\-l, q) = (pos eg 1 q) = 
Pi and pos(ei+i,i-\- I,!?) = pos(e\, I,!?) = pj. 

If yes, generate a function (succ d) = Cj if (q args) with a at pi in q and 
6j at Pj in q 
else fail. 

• Introduce data type sequence into the plan: 

For each state, keep only bottom-test predicate (g ai . . .Un) 

Replace arguments of g and of actions by (sued eg) in accordance to the 
association list. 

— If the plan is a single step: identify bottom-test and bottom as above, reduce 
states to the bottom-test predicate. 

— If the plan starts at a level > 0:an “intermediate” goal must be identified; after- 
wards, proceed as above. 
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Note, that inference is performed over a plan which is generated by our Lisp- 
implemented planning system. Therefore, expressions are represented within 
parenthesis. For example, the literal, representing that a block A is lying on 
another block B is represented as (on A B); the action to put block A on the 
table is represented as (puttahle A ). The restriction to a single top-level goal 
in the algorithm can be extended to multiple goals by a slight modification.® 
For the application of an inferred recursive control rule, an additional 
function for identifying successor-elements from the current state has to be 
provided as described in algorithm 8.1. The program code for generating this 
function is given in figure 8.4. The first function {succ-pattern) represents 
a pre-defined pattern for the successor- function. For a concrete plan, this 
pattern must be instantiated in accordance with the predicates occuring in 
the plan. For example, for the clearblock problem (see chap. 3 and below), 
the successor function topof(x) = y is constructed from predicates on(y, x). 



; pattern for getting the succ of a constant 

; pred has to be replaced by the predicate-name of the rewrite-rule 
; x-pos has to be replaced by a list-selector (position of x in pred) 

; y-pos dito (position of y = (succ x) in pred) 

(setq succ-pattern ’ (defun succ (x s) 

(cond ((null s) nil) 

((and (equal (first (car s)) pred) 

(equal (nth x-pos (car s)) x)) 

(nth y-pos (car s)) 

) 

(T (succ X (cdr s))) 

))) 

; use a pattern for calculating the successor of a constant from a 
; plcinning state (succ-pattern) and replace the parameters for the 
; current problem 

; this function has to be saved so that the synthesized program 

; can be executed 

(defun transf orm-to-f ct (r) 

; r: ((pred ...) (y = (succ x))) 

; pred = (first (car r)) 

; find variable-names x and y and find their positions in (pred ...) 

; replace pred, x-pos, y-pos 
(setq r-pred (first (car r))) 

(setq r-x-pos (position (second (third (second r))) (first r))) 

(setq r-y-pos (position (first (second r)) (first r))) 

(nsubst (cons ’quote (list r-pred)) ’pred 

(nsubst r-x-pos ’x-pos (nsubst r-y-pos ’y-pos succ-pattern))) 

) 

Fig. 8.4. Generating the S’nccessor-Function for a Sequence 



Unstacking Objects. A prototypical example for plans over a sequence of 
objects is unstacking objects - either to put all objects on the ground or 

® Data type sequence must be introduced as first step. If the sequence of objects 
in the plan is identified, it is also identified which predicate characterizes the 
elements of this sequence. The top-level goals must include this predicate with 
the bottom-element as argument and consequently can be selected as “bottom- 
test” predicate. 
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to clear a specific object located somewhere in the staple. Such a planning 
domain was specified by clearblock as presented in section 3. 1.4.1 in chapter 
3. Here we use a slightly different domain, omitting the predicate ontable and 
with operator puttable named unstack. The domain specification and a plan 
for unstack are given in figure 8.5. 



D = { ((clear Ol) (clear 02) (clear 03)) , 
((on 02 03) (clear Ol) (clear 02)), 
((on Ol 02) (on 02 03) (clear Ol)) } 

Q = {(clear 03)} 

O = {unstack} with 

(unstack ?x) 

PRE {(clear ?x), (on ?x 

?y)} 

ADD {(clear ?y)} 

DEL {(on ?x ?y)} 

Fig. 8.5. The Unstack Domain and Plan 



((CLEAR Ol) (CLEAR 02) (CLEAR 03)) 



(UNSTACK 02) 



((ON 02 03) (CLEAR 01) (CLEAR 02)) 



(UNSTACK Ol) 



((ON Ol 02) (ON 02 03) (CLEAR Ol)) 



The protocol of plan-transformation is given in figure 8.6. After identifying 
the data type sequence^ the crucial step is the introduction of the successor- 
function: (succ x) = y = (on y x) which represents the “block lying on top of 
block x”. While such functions are usually pre-defined (Manna & Waldinger, 
1987; Geffner, 2000; Wysotzki & Schmid, 2001), we can infer them from the 
universal plan. The “constructively” rewritten plan where data type sequence 
is introduced is given in figure 8.7. The transformation information stored for 
the finite program is given in appendix CC.3. 

The finite program term which can be constructed from the term given 
in figure 8.7 is: 

(IF (CLEAR 03 S) 

S 

(UNSTACK (SUCC 03 S) 

(IF (CLEAR (SUCC 03 S) S) 

S 

(UNSTACK (SUCC (SUCC 03 S) S) 

(IF (CLEAR (SUCC (SUCC 03 S) S) S) 

S 

OMEGA) ) ) ) ) . 

An RPS (see chap. 7) generalizing this unstack tevvo. is A = ( (unstack-all o 
s) = (if (clear o s) s (unstack (succ os) (unstack-all (succ o s) s))) ) with 
to = (unstack-rec os) for some constant o and some set of literals s. The 
executable program is given in figure 8.8. 

Plans consisting of a linear sequence of operator applications over a se- 
quence of objects in general result in generalized control knowledge in form 
of linear recursive functions. In standard programming domains (over num- 
bers, lists), a large group of problems is solvable by functions of this recursion 
class. Examples are given in table 8.2. 
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+++++++++ Transform Plan to Program ++++++++++ 
1st step: decompose by operator-type 
Single Plan 
(SAVE SUBPLAN PI) 



2nd step: Identify and introduce data type 

(INSPECTING PI) Plan is linear 

(SINGLE GOAL-PREDICATE (CLEAR 03)) 

Plan is of type SEQUENCE 

(SEQUENCE IS 03 02 01) 

(IN CONSTRUCTIVE TERMS THAT’S (02 (SUCC 03)) (01 (SUCC (SUCC 03)))) 
Building rewrite-cases: 

(((ON 02 03) (02 = (SUCC 03))) ((ON 01 02) (01 = (SUCC 02)))) 

(GENERALIZED RULE IS (ON I ?xl I I ?x2 I ) (|?xl| = (SUCC I ?x2 I ) ) ) 
Storage as LISP-function 

Reduce states to relevant predicates (footprint) 

((CLEAR 03)) 

((CLEAR (SUCC 03))) 

((CLEAR (SUCC (SUCC 03)))) 



3rd step: Transform plan to program 
Show Plan as Program? y 



Fig. 8.6. Protocol for Unstack 




Fig. 8.7. Introduction of Data Type Sequence in Unstack 



It is simple and straight-forward to generalize over linear plans. For this 
plans of problems, our approach automatically provides complete and correct 
control rules. As a result, planning can be avoided completely and the trans- 
formation sequence for solving an arbitrary problem involving an arbitrary 
number of objects can be solved in linear time! 
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; Complete recursive program for the UNSTACK problem 
; call (unstack-all <oname> <state-description>) 

; e.g. (unstack-all ’03 ’((on ol o2) (on o2 o3) (clear o3))) 



; generalized from finite program generated in plan-transform 
(defun unstack-all (o s) 

(if (clear o s) 
s 

(unstack (succ o s) (unstack-all (succ o s) s)) 

) ) 

(defun clear (o s) 

(member (list ’clear o) s :test ’equal) 

) 



; inferred in plan-transform 
(DEFUN SUCC (X S) 

(COND ((NULL S) NIL) 

((AND (EQUAL (FIRST (CAR S)) ’ON) 

(EQUAL (NTH 2 (CAR S)) X)) 

(NTH 1 (CAR S))) 

(T (SUCC X (CDR S))))) 

; explicit implementation of "unstack" 

; in connection with DPlan: apply unstack-operator on state $s$ and 
return 

; the new state 
(defun unstack (o s) 

(cond ((null s) nil) 

((and (equal (first (car s)) ’on) (equal (second (car s)) o)) 
(cons (cons ’clear (list (third (car s)))) (cdr s)) 

) 

(T (cons (car s) (unstack o (cdr s)))) 

)) 

Fig. 8.8. LlSP-Program for Unstack 



Table 8.2. Linear Recursive Functions 



(unstack-all x s) == 
(factorial x) == 
(sum x) == 

(expt m n) == 
(length 1) == 
(sumlist 1) == 
(reverse 1) == 
(append 11 12) == 



(if (clear x) s (unstack (succ x) (unstack-all (succ x) s))) 
(if (eqO x) 1 (mult x (factorial (pred x)))) 

(if (eqO x) 0 (plus x (sum (pred x)))) 

(if (eqO n) 1 (mult m (expt m (pred n)))) 

(if (null 1) 0 (succ (length (tail 1)))) 

(if (null 1) 0 (plus (head 1) (sumlist (tail 1)))) 

(if (null 1) nil (append (reverse (tail 1)) (list (head 1)))) 
(if (null 11) 12 (cons (head 11) (append (tail 11) 12))) 
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8.4 Plans over Sets of Objects 

A plan which has a single root and a single leaf where the set of actions 
for each path from root to leaf are identical is assumed to deal with a set 
of objects. For sets, there has to be a complex data object which is a set of 
elements (constants of the planning domain), a bottom-element - which is 
inferred from the elements involved in the top-level goals a bottom-test 
which has to be an inferred predicate over the set, and two selectors - one for 
an element of a set {pick) and one for a set without some fixed element {rst). 
The partial order over sets with maximally three elements is given in figure 
8.9 -- this order corresponds to the sub-plans for unload or load in the rocket 
domain (sect. 3. 1.4. 2 in chap. 3). If pick and rst are defined deterministically 
(e. g., by list-selectors), the partial order gets reduced to a total order. 




{e1,e2,e3} 



Fig. 8.9. Partial Order of Set 



Definition 8.4.1 (Set). Data type set is defined as: 
set = _L I c(e, set) with 

emptyiset) = | 

pick(set) = some e G set 
rst(set) = set \ e for some e G set. 

Because the complex data object is inferred from the top-level goals (given in 
the root of the plan) , we typically infer an “inverted” order - with the largest 
set (containing all objects which can be element of set) as bottom element 
and c(e, set) == rst{set) as a de-structor. 

For a plan - or sub-plan - with hypothesized data type set, data type 
introduction works as described in the algorithm given in table 8.3. Because 
we collapse such a plan to one path of a set of paths, this algorithm completely 
contains the case of sequences (tab. 8.1). Note, that collapsing plans with 
underlying data type set corresponds to the idea of “commutative pruning” 
as discussed in (Haslum & Geffner, 2000). 



if set = _L 
otherwise 
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Table 8.3. Introducing Set 

— Collapse plan to one path (implemented as: take the “leftmost” path). 

— Generate a complex data object like Generate Sequence in tab. 8.1. 

sequence = (el . . . Cm) is interpreted as set and bottom is instantiated with 
CO = (el . . . Cm)- 

A function for generating CO from the top-level goals (make-co) is provided. 

— A generalized predicate (g* args) with CO € args is constructed by collecting 
all predicates (g args) with o G (70 A o G args and replacing o by CO. For a plan 
starting at level 0, g has to be a top-level goal and all top-level goals have to be 
covered by g*\ for other plans, g has to be in the current root node. 

— A function for testing whether g* holds in a state is generated as bottom-test. 

— Introduce the data type into the plan: 

For each state keep only bottom-test predicate (g* args) with CO' C CO G args. 
Introduce set-selectors for arguments of g* by replacing CO' by {rsf CO) and 
afterwards replacing action-arguments by {pick{rst' CO)) with {rsC CO) occur- 
ring as argument of g* of the parent-node. 

[pick set) and (rstset) are predefined by car and cdr. 



The program code for make-co, generating the bottom-test, and for pick, and 
rst referred to in algorithm 8.3 is given in figure 8.10. As described for data 
type sequence above, the pattern of the selector functions are pre-defined and 
the concrete functions are constructed by instantiating these patterns with 
information gained from the given plan. Introduction of a new predicate with 
a complex argument corresponds to the notion of “predicate-invention” in 
Wysotzki and Schmid (2001).® 

Oneway Transportation. A typical example for a domain with a set as un- 
derlying data type are simple transportation domains, where some objects 
must be loaded into a vehicle which moves them to a new location. Such a 
domain is rocket (see sect. 3. 1.4. 2 in chap. 3). The rocket plan is in a first step 
levelwise decomposed into sub-plans with uniform actions (see fig. 8.11). 

Data type inference is done for each sub-plan. For both sub-plans (unload- 
all and load-all) there is a single root and a single leaf node and the sets of 
actions along all (six) possible paths from root to leaf are equal. From this 
observation we can conclude that the actual sequence in which the actions 
are performed is irrelevant, that is, the underlying data structure of both 
sub-plans is a set. Consequently, the (sub-) plan can be collapsed to one 
path. 

Introduction of the data type set involves the following steps: 

~ The initial set is constructed by collecting all arguments of the actions 
along the path from root to leaf. That is, the initial value for the set can 
be / = {01,02,03} - when the left-most path of a sub-plan is kept. (If 



This kind of predicate invention should not be confused with predicate invention 
in ILP (see sect. 6.3.3 in chap. 6). In ILP a new predicate must necessarily be 
introduced if the current hypothesis covers some negative examples. 
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; make-co 



; collect all objects covered by goal-preds p corresponding to the 
; new predicate p* 

; the new complex object is referred to as CO 
; f.e. for rocket: (at ol b) (at o2 b) — > CO = (ol o2) 

; how to make the complex object CO: use the pattern of the new 
; predicate to collect all objects from the current top-level goal 
;; use for call of main function, f.e.: (rocket (make-co ..) s) 
(defun make-co (goal newpat) 

(cond ((null goal) nil) 

((string< (string (caar goal)) (string (car newpat))) 

(cons (nth (position ’CO newpat) (car goal)) 

(make-co (cdr goal) newpat))) 



)) 



; rest (set) (implemented as for lists, otherwise pick/rest would not 

; be guaranteed to be really complements) 

; ; named rst because rest is build-in 
(defun rst (co) 

(cdr co) 

) 

; pick(set) 

(defun pick (co) 

(car co) 

) 



; is newpred true in the current situation? 

; ; used for checking this predicate in the generalized function 
;; f.e. (at* CO Place s) 

;; newpat has to be replaced by the newpred name (f.e. at*) 

;; pname has to be replaced by the original pred name (f.e. at) 

(setq newpat ’ (defun newp (args s) 

(cond ((null args) T) 

((and (null s) (not (null args))) nil) 

((and (equal (caar s) pname) 

(intersection args (cdar s))) 

(newp (set-difference args (cdar s))(cdr s))) 
(T (newp args (cdr s))) 

)) 

) 

(defun make-npfct (patname gpname) 

(subst patname ’newp (nsubst (cons ’quote (list gpname)) 

’ pname newpat ) ) 

) 

Fig. 8.10. Functions Inferred/Provided for Set 



the involved operator has more than one argument, for example (unload 
<obj> <place>), the constant arguments are ignored.) 

— A “generalized” predicate - corresponding to the empty-test of the data 
type - is invented by generalizing over all literals in the root-node with an 
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element of the initial set I as argument. That is, for the unload-all sub-plan, 
the new predicate is p = {at* (objSet) B) with 



The original literals are replaced by p. 

— The arguments of the generalized predicate are rewritten using the pre- 
defined rest-selector. We have the following replacements for the unload 
sub-plan (given from root to leaf): 

(at*{01,02,03} B) == {at* {01,02, 03} B) 

{at* {01,02} B) {at*{rst{01,02,03}) B) 

{at*{01} B) {at*{rst{rst{01,02,03})) B) 

{at*{ } B) {at*{rst{rst{rst{01,02,03}))) B). 

The arguments of the actions are rewritten using the predefined pick- 
selector: 

{unload Ol) — > {unload{pick{01, 02, 03})) 

{unload 02) — > {unload{pick{rst{01,02,03}))) 

{unload 03) — > {unload{pick{rst{rst{01,02,03})))). 

Note, that we define pick and rst deterministically (e. g., as head/ tail or 
last/ butlast). Although it is irrelevant which object is selected next, it is 
necessary that rst returns exactly the set of objects without the currently 
picked one. 

The transformed sub-plan for unload-all is given in figure 8. 12. a. After intro- 
ducing a data type into the plan, there is only one additional step necessary 
to interpret the plan as a program - introducing a situation variable. Now 
the plan can be read as a nested conditional expression (see fig. 8.12.b). 

Plan transformation for the rocket problem results in the following pro- 
gram structure 

(rocket oset s) = (unload-all oset (move-rocket (load-all oset s))) 

where unload-all and load-all are recursive functions which were generated by 
our folding algorithm from the corresponding finite programs. Note, that the 
parameters involve only the set of objects - this information is the only one 
necessary for control, while locations {A, B) and transport- vehicle {Rocket) 
are additional information necessary for the dynamics of plan construction. 
The protocol of plan-transformation is given in figure 8.13. 

Recursive functions for unloading and loading objects are: 

unload — all{oset, s) = if{ at * {oset, B, s), 

s, 

unload{pick{oset), unload — all{rst{oset), s))) 
load — all{oset, s) = if{ inside * {oset, Rocket, s), 




true if Vo € objSet : {at o B) 



holds in the current state 
false otherwise. 



s, 

load{pick{oset),load — all{rst{oset) , s))) 



which are integrated in the “main” program (rocket oset s) given above. 
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(b) "move-rocket" 
(single step) 



((IN 03 R) (IN 01 R) (IN 02 R) (AT R B)) 
(MOVE-HOCKET) 



((AT R A) (IN 01 R) (IN 02 R) (IN 03) R) 




Fig. 8.11. Sub-Plans of Rocket 



I ((AT* (01 02 03) B)) I 
(UNLOAD (PIC K (01 02 03))) 

I 

I ((AT* (RST (01 02 03)) B)) | 

(UNLOAD (PICK ( RST (01 02 03)))) 

i 

((AT* (RST (RST (01 02 03))) B)) 
(UNLOAD (PICK 01S T (RST (01 02 03))))) 

:: 

((AT* (RST (RST (RST (01 02 03)))) B)) 

(a) 




Fig. 8.12. Introduction of the Data Type Set (a) and Resulting Finite Program 
(b) for the Unload- All Sub-Plan of Rocket (12 denotes “undehned”) 



The recursive functions for loading and unloading all objects in some 
arbitrary order learned from the object rocket problem can be of use in many 
transportation problems, as for example the logistics domain^ which is still 
one of the most challenging domains for planning algorithms (see the AIPS 
planning competitions 1998 and 2000). 



^ The logistics domain defines problems, where objects have to be transported 
from different places in and between different cities, using trucks within cities 
and planes between cities. 
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+++++++++ Transform Plan to Program ++++++++++ 
1st step: decompose by operator-type 
Possible Sub-Plan, decompose... 

(SAVE SUBPLAN #:P1) 

Possible Sub-Plan, decompose... 

(SAVE SUBPLAN #:P2) 

Single Plan 
(SAVE SUBPLAN #:P3) 

(#:P1 (#:P2 (#:P3))) 



2nd step: Identify and introduce data type 
(INSPECTING #:P1) Plan is of type SET 
Unify equivalent paths . . . 

Introduce complex object (CO) 

(CO IS (01 02 03)) 

A function for generating CO from a goal is provided: make-co 
Generalize predicate... 

(NEW PREDICATE IS (#:AT* CO B)) 

Generate a function for testing the new predicate... 

New predicate covers top-level goal -> replace goal 
Replace basic predicates by new predicate... 

Introduce selector functions... 

((((01 02) (RST (01 02 03))) ((01) (RST (RST (01 02 03)))) 
(NIL (RST (RST (RST (01 02 03)))))) 

((03 (PICK (01 02 03))) (02 (PICK (RST (01 02 03)))) 

(01 (PICK (RST (RST (01 02 03))))))) 

RST(CO) and PICK(CO) are predefined (as cdr and car) . 

(INSPECTING #:P2) Plan is linear 
Plan consists of a single step 

(SET ADD-PRED AS INTERMEDIATE GOAL (AT ROCKET B)) 

(INSPECTING #:P3) Plan is of type SET 
Unify equivalent paths... [... see PI] 

Generalize predicate... 

(NEW PREDICATE IS (#: INSIDER* CO)) 

Generate a function for testing the new predicate... 

New predicate is set as goal! 

Replace basic predicates by new predicate... 

Introduce selector functions... [... see PI] 



3rd step: Transform plan to program 
Show Plan(s) as Program (s)? y 

Fig. 8.13. Protocol of Transforming the Rocket Plan 



For the rocket domain our system can learn the complete and correct con- 
trol rules. All problems of this domain can now be solved in linear time. A 
small flaw is, that generalization-to-n assumes infinite capacity of the trans- 
port vehicle and does not take into account capacity as an additional con- 
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straint. To get more realistic, we have to include a resource variable® for 
the load operator. The resulting load-all function must involve an additional 
condition: 

load — all(oset, c, s) = if{ or{eqO{c), at * {oset, B, s)), 

s, 

load{pick{oset),load — all{rst{oset) , pred{c) , s))). 

A further extension of the domain would be, to take into account different 
priorities for objects to be transported. This would involve an extension of 
pick, selecting always the object with highest priority, that is, the control 
function would follow a greedjf-algorithm. 

A Lisp-program representing the control knowledge for the rocket domain 
is given in appendix CC.4 together with a short discussion about interleaving 
the inside and at predicates. 



8.5 Plans over Lists of Objects 

8.5.1 Structural and Semantic List Problems 

List-problems can be divided in two classes: (a) problems which involve no 
knowledge about the elements of the list, and (b) problems which involve 
such knowledge. Standard programming problems of the first class are for 
example reversing a list, flattening a list, or incrementing elements of a list. 
Problems, where some operation is performed on every element of a list can 
be characterized by the higher-order function (map f 1). Other list problems 
which can be solved purely structurally are calculating the length of a list or 
adding the elements of a list of numbers. Such problems can be characterized 
by the higher-order function (reduce f b 1). The unload and load problems 
discussed above fall into this class if each object involved has a unique name 
and if pick and rst are realized in a deterministic way. A third class of prob- 
lems follow (filter p 1), for example the functions member, or odd-els. This 
class already involves some semantic knowledge about the elements of a list 
- represented by the predicate p in filter and by the equal test in member. 
Table 8.4 illustrates structural list-problems. 

Structural list problems, as discussed for example in section 6. 3. 4.1 in 
chapter 6, can be dealt with by an algorithm nearly identical to algorithm in 
table 8.3 dealing with sets (see tab. 8.5). Because the only relevant informa- 
tion is the length of a list, the partial order can be reduced to a total order 
(see fig. 8.14). Generating a total order results in linearizing the problem 
(Wysotzki & Schmid, 2001). The extraction of a unique path in the plan is 
only slightly more complicated as for sets and is discussed below. 

® Dealing with resource-variables is possible with the DPlan-system extended to 
function applications, see chap. 4. Currently we cannot generate disjunctive 
boolean operators in plan transformation. 
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Table 8.4. Structural Functions over Lists 



(a) (map f 1) == (if (empty 1) nil (cons (f (head 1)) (map f (tail 1)))) 

(inc 1) == (if (empty 1) nil (cons (succ (head 1)) (inc (tail 1)))) 

(b) (reduce f b 1) == (if (empty 1) b (f (head 1) (reduce f b (tail 1)))) 

(sumlist 1) — — (if (null 1) 0 (plus (head 1) (sumlist (tail 1)))) 

(rec-unload oset s) == (if (empty oset) s 

(unload (pick oset) (rec-unload (rst oset) s))) 

(c) (filter p 1) — — (if (empty 1) nil 

(if (p (head 1)) (cons (head 1) (filter p (tail 1))) (filter p (tail 1)))) 
(odd-els 1) == (if (empty 1) nil 

(if (odd (head 1)) (cons (head 1) (filter (tail 1))) (filter (tail 1)))) 

(member e 1) (if (empty 1) nil 

(if (equal (head 1) e) e (member e (tail 1)))) 



Definition 8.5.1 (List). Data type list is defined as: 
list = nil I cons{e, list) with 



null{list) = 



true if list = nil 
false otherwise 



head(cons(e,list)) = e 
tail( cons (e, list)) = list 





( 1111 ) ( 1111 ) ... 



nil 

(w) 

(w w) 

(w w w) 
(w w w w) 

(b) 



Fig. 8.14. Partial Order (a) and Total Order (b) of Flat Lists over Numbers 
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Table 8.5. Introducing List 

— ^Collapse plan to one path, (discussed in detail below) 

— Generate a complex data object CO = (el . . . Cm)- 

A function for generating CO from the top-level goals (make-co) is provided. 

— A generalized predicate (g* args) with CO G args is constructed by collecting 
all predicates (g args) with o G CO f\o G args and replacing o by CO. For a plan 
starting at level 0 g has to be a top-level goal and all top-level goals have to be 
covered by g*\ for other plans g has to be in the current root node. 

— A function for testing whether g* holds in a state is generated as bottom-test. 

— Introduce the data type into the plan: 

For each state keep only bottom-test predicate {g* args) with CO' C CO G args. 
Introduce list-selectors by replacing CO' by [tail’’ CO) and afterwards replacing 
action-arguments by (head{taiP CO)) with (tail' CO) occurring as argument of 
p* of the parent-node. 



While functions over lists involving only structural knowledge are easy 
to infer with our approach - as shown for the rocket domain this is not 
true for the second class of problems. A proto-typical example for this class is 
sorting: for sorting a list, knowledge about which element is smaller (or larger) 
than another is necessary. That synthesizing functions involving semantic 
knowledge is notoriously hard is discussed at length in the ILP literature 
(Flener & Yilmaz, 1999; Le Blanc, 1994) and inductive functional program 
synthesis typically is restricted to structural problems (Summers, 1977). 

Currently, we approach transformation for such problems by the steps 
presented in the algorithm in table 8.6. We do not claim, that this strategy 
is applicable to all semantic problems over lists. We developed this strategy 
from analyzing and implementing plan transformation for the selection sort 
problem, which is described below. We will describe how semantic knowledge 
can be “detected” by analyzing the structure of a plan. For the future, we 
plan to investigate further problems and try to find a strategy which covers 
a class as large as possible. 

8.5.2 Synthesizing ‘Selection- Sort’ 

8. 5. 2.1 A Plan for Sorting Lists. The specification for sorting lists with 
four elements is given in table 3.6. In the standard version of DPlan described 
in this report we only allow for ADD-DEL-effects and we do not discrimi- 
nate between static predicates (such as greater than, being not affected by 
operator application) and fluid predicates. Note, that it is enough to specify 
the desired position of three of the four list-elements in the goal, because 
positioning three elements determines the position of the fourth. A more 
natural specification for DPlan with functions is given in figure 4.10. This 
second version allows for plan construction without using a set of predefined 
states. Information about which element is on which position in the list or 
what number is greater than another can simply be “read” from the list by 
applying predefined (LISP-) functions. The definition of the swap-operator 
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Table 8.6. Dealing with Semantic Information in Lists 

— Extracting a path (identifying list structure ) : 

— Extracting a minimal spanning tree from the DAG: The plan is a DAG, bnt 
the structure does not fulfill the set-criterium defined above. Therefore, we 
cannot just select one path, but we have to extract one deterministic set of 
transformation sequences. For purely structural list-problems every minimal 
spanning tree is suitable for generalization. For problems involving semantic 
knowledge only some of the minimal spanning trees can be generalized. 

— Regularization of the tree: Generating plan levels with identical actions by 
shifting nodes downward in the plan and introducing edges with “empty” or 
“id” actions. 

— Collapsing the tree: Unifying identical subtrees which are positioned at the 
same level of the tree. 

— If there are still branches left (identifying semantic criterium for elements): 

— Identify a criterium for classifying elements. 

— Unify branches by introducing list as argument into operator using the cri- 
terium as selection-function. 

— Proceed using algorithm 8.5. 



determines whether the problem is solved by bubble-sort or by selection-sort. 
In the first case, swap is applied to neighboring elements where the first is 
greater than the other; in the second case, the first condition is omitted. Note, 
that for finding operator-sequences for sorting a list by an ascending order 
by backward planning, the greater condition is reverse! 

The universal plan is given in figure 3.7. For sorting a list of four elements, 
there exist 24 states. Swapping elements with the restrictions given for selec- 
tion sort results in 72 edges. Sorting lists of three elements is illustrated in 
appendix CC.5. 

8. 5. 2. 2 Different Realizations of ‘Selection Sort’. To make the trans- 
formation steps more intuitive, we first discuss functional variants of selsort 
(see tab. 8.7): The first variant is a standard implementation with two nested 
/or-loops. The outer loop processes the list I (more exactly, the array) from 
start (s) to end (e), the inner loop (function smpos) searches for the position 
of the smallest element in I, starting at index (1 -|- s) where s is the current 
index of the outer loop. The /or-loops are realized as tail-recursions. 

There is some conflict between a tail-recursive structure - where some 
input state is transformed step-wise to the desired output - and a plan rep- 
resenting a sequence of actions from the goal to some initial state. Our def- 
inition of a plan as program implies a linear recursion (see def. 8.1.1). The 
second variant of selection sort is a linear recursion: For a list I with starting 
index s and last index e, it is checked, whether I is already sorted from s to a 
current index c which is initialized with e and step-wise reduced. If yes, the 
list is returned, otherwise, it is determined which element at positions c, . . . , e 
should be swapped to position c — 1 . The inner loop is replaced by an explicit 
selector-function. We will see below, that this corresponds to selecting one 
of several elements represented at different branches on the same level of the 
plan. 
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Table 8.7. Functional Variants for Selection-Sort 

; (1) Standard: Two Tail-Recursions 
(defun selsort (1 s e) 

(if (= s e) 

1 

(selsort (swap s (smpos s (1+ s) e 1) 1) (1+ s) e) 

)) 

(defun smpos (s ss e 1) 

(if (> ss e) 
s 

(if (> (nth s 1) (nth ss 1)) 

(smpos ss (1+ ss) e 1) 

(smpos s (1+ ss) e 1) 

))) 

; (2) Realization as Linear Recursion 

; c is ‘‘counter’’, starting with last list-position e 
(defun Iselsort (lose) 

(if (sorted 1 s c) 

1 

(swap* (1- c) c e (Iselsort 1 (1- c) s e)) 

)) 

(defun swap* (s from to 1) 

(swap s (smpos s from to 1) 1) 

) 

(defun sorted (1 from c) 

(equal (subseq 1 from c) (subseq (sort (copy-seq 1) ’<) from c)) 

) 



; (3) Explicit definition of order (gl is list of pos-key pairs) 

; e.g. gl = ((3 4) (2 3) (1 2) (0 1)) — > sorted 1 = (1 2 3 4) 
(defun llselsort (gl 1) 

(if (Isorted gl 1) 

1 

(Iswap* (car gl) (llselsort (cdr gl) 1)) 

)) 

(defun Isorted (gl 1) 

(cond ((null gl) T) 

((equal (second (car gl))(car 1)) (Isorted (cdr gl) (cdr 1))) 
(T nil) 

)) 

(defun Iswap* (g 1) 

(swap (first g) (position (second g) 1) 1) 

) 



The second variant corresponds closely to the function we can infer 
from the plan, it can be inferred from a plan generated by using function- 
application.® A plan constructed by manipulating literals contains no knowl- 

® Our investigation of plan transformation for plans involving function-application 
is still at the beginning. 
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edge about numbers as indices in a list and order relations between numbers. 
Looking back at plan transformation for rocket, we introduced a “complex 
object” oset which guided action application in the recursive unload and load 
function. Thus, for sorting, we can infer a complex object from the top-level 
goals which determine which number should be at which position of the list. 
The third variant gives an abstract representation of the function which we 
can infer automatically from the plan in figure 3.7. This function is more 
general than standard selection sort, because now lists can be sorted in ac- 
cordance to any arbitrary order relation specified in parameter gl\ That is, 
it does not rely on the static predicate gt(x, y) which represents the order 
relation of the first n natural numbers (see fig. 3.6). 

8. 5. 2. 3 Inferring the Function- Skeleton. The plan given in figure 3.7 is 
not decomposed by the initial decomposition step, because the complete plan 
involves only one operator ~ swap. The plan is a DAG, but it does not fulfill 
the criterium for sets of objects. Therefore, extraction of one unique set of 
optimal plans is done by picking one minimal spanning tree from the plan 
(see sect. 3.2 in chap. 3). 

The plan for sorting lists with three elements (see appendix CC.5) consists 
of 3! = 6 nodes and 9 edges. It contains 9 different minimal spanning trees 
(see also appendix). Not all of them are suitable for generalization: Three of 
the nine trees can be “regularized” (see next transformation step below). If 
we have no information for picking a suitable minimal spanning tree, we have 
to extract a tree, try to regularize it and backtrack if regularization fails. For 
9 candidates with 3 suitable solutions, this is feasible. But for the sorting of 4 
element list, there exist 24 nodes, 72 edges and more than a million possible 
minimal spanning trees with only a small amount of them being regularizable 
(see appendix AA.IO for calculation of number of minimal spanning trees). 
Currently, we pre-calculate the number of trees contained in the DAG and 
if the number exceeds a given threshold t (say 500), we only generate the 
first t trees. One possible but unsatisfying solution would be to parallelize 
this step, which is possible. We plan to investigate whether tree-extraction 
and regularization can be integrated into one step. This would solve the 
problem in an elegant way. One of the regularizable minimal spanning trees 
for sorting four elements is given in figure 8.15. For bettern readability, the 
state descriptions in the nodes are given as lists. In the original plan, each list 
is described by a set of literals. For example, [1 2 3 4] is a short notation for 
(isc pi 1) (isc p2 2) (isc p3 3) (isc p4 4) where isc(x y) means that position 
X has content y (see sect. 3. 1.4. 3 in chap. 3 for details). 

The algorithm for extracting a minimal spanning tree from a DAG is 
given in table 8.8, the corresponding program fragment in appendix AA.ll. 

The original plan represented how each of the 24 possible lists over four 
(different) numbers can be transformed into the desired goal (the sorted list) 
by the set of all possible optimal sequences of actions. The minimal spanning 
tree gives a unique sequence of actions for each state. In the minimal spanning 
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Fig. 8.15. A Minimal Spanning Tree Extracted from the SelSort Plan 
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Table 8.8. Extract an MST from a DAG 

— Input: a DAG with edges (n m) annotated by the level of the node 

— Initialization: t == root (dag) (minimal spanning tree) 

— For Z = 0 to maxlevel — 1 DO: 

— Partition all edges (n m) from nodes n at level I to nodes m at level Z + 1 into 
groups with equal end-node m: 

P = rni){n 2 mi) . . . (n^ mi)}, . . . {{n[ mi) . . . (n}, mi)}}. 

— Galculate the Cartesian product between sets in p: Cart = pi x p 2 x . . . x p;. 

— Generate trees t' = tU c, for all c £ Cart. 



tree given in figure 8.15, the action sequences for sorting lists by shifting the 
smallest element to the left is given. 

For collapsing a tree into a single path, this tree has to be regular^°: 

Definition 8.5.2 (Regular Tree). An edge-labeled tree t with edges (n m) 
from nodes n to nodes m is regular, if for each level 1 = 0 ... maxlevel — 1 the 
subtrees for each node n {{n mi), (n m 2 ), . . . (n mu)} consist of id-identical 
label-sets with label = label' if label = label' or label = id. 

For a given minimal spanning tree it can be tried to transform it into a 
regular tree, using the algorithm given in table 8.9. The program fragment 
for tree-regularization is given in appendix AA.12. A tree is regularized by 
pushing all edges labeled with actions occurring also on the next level of the 
tree down to this level. The starting node of such an edge is “copied” and 
an id-edge is introduced between the original and the copied starting node. 
Note, that an edge can be shifted more than once. If the result is a regular 
tree according to definition 8.5.2, plan-transformation proceeds, otherwise, it 
fails. 

Table 8.9. Regularization of a Tree 

— Input: an edge-labeled tree 

— For Z = 0 to maxlevel — 2 DO: 

— Construct a label-set LSi for all edges (n m) from nodes n on level Z to nodes 
m on level I -\- 1 and a label-set LSi+i for all edges (op) from nodes o on level 
Z -I- 1 to nodes p on level Z -I- 2. 

— IF LSi+i C LSi shift all edges (us m 3 ) £ LSi O LSi+i one level and introduce 
edges (us Ua) with label “id” from level Z to the shifted nodes ns on level n-|- 1. 

— Test if the resulting tree is regular. 



The regularized version of the minimal spanning tree for selsort is given 
in figure 8.16 (again with abbreviated state descriptions). The structure of 
the recursion to be inferred is now already visible in the regularized tree: The 

This definition to regular trees is similar to the definition of irrelevant attributes 
in decision trees (Unger & Wysotzki, 1981): If a node has only identical subtrees, 
the node and all but one subtree are eliminated from the tree. 
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“parallel” subtrees have identical edge-labels and the states share a large 
overlap: The actions in the bottom level of the tree all refer to swap the 
element on position one with some element further to the right in the list 
{swap (pi, x)). For the resulting states, the first element is already the smallest 
element of the list ([1 x y z]). On the next level, all actions refer to swap 
the element on position two with some element further to the right, and so 
on. A slightly different method for regularization, also with the goal of plan 
linearization is presented in Wysotzki and Schmid (2001). 




Fig. 8.16. The Regularized Tree for SelSort 



8. 5. 2.4 Inferring the Selector Function. Although the sdsort plan could 
be transformed successfully to a regular tree with identical subtrees, the plan 
is still not linear. Considering the nodes at each planning level, these nodes 
share a common sub-set of literals. That is, at each level, the commonalities 
of the states can be captured by calculating the intersection of the state 
descriptions. For the branches from one node holds that their ordering is 
irrelevant for obtaining the action sequence for transforming a given state 
into the goal state. Furthermore, the names of all actions are identical and at 
each level, the first argument of swap is identical. Putting this information 
together, we can represent the levels of the regularized tree as: 

— ((isc pi 1) (isc p2 2) (isc p3 3)) ^ (swap p3 [p4]) 

~ ((isc pi 1) (isc p2 2)) <— (swap p2 [p3, p4]) 

~ ((isc pi 1)) ^ (swap pi [p2, p3, p4]). 

From the instantiations of the second argument we can construct a hypo- 
thetical complex object: COs = [p4] < [p3,p4] < [p2,p3,p4]. Note, that the 
numbers associated with positions are not recognized as numbers. Another 
minimal spanning tree extracted from the plan, would result in a different 
pattern, for example [p2] < [pA,p2] < [pl,pA,p2] which is generalizable in the 
same way as we will describe in the following. 

The data object which finally will become the recursive parameter is con- 
structed along a path in the plan (as described for rocket). On each level in 
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the plan, the argument(s) of an action can be characterized relative to the 
object involved in the parent node. Now, we have to introduce a selector func- 
tion for an argument of an action involving actions on the same level of the 
plan with the same parent node. That is the searched for function has to be 
defined with respect to the children of the action. Remember, that this is a 
backward plan and that a child-node is input to an action. As a consequence, 
the selector function has to be applied to the current instantiation of the list 
(situation). 

The searched-for function for selecting one element of the candidate ele- 
ments represented in the second argument of swap has to be defined relative 
to the information available at the current “position” in the plan. That is, 
the literals of the parent node and the first argument of the swap operator 
can be used to decide which element should be swapped in the current (child) 
state. For example, for (swap p3 [p4]), we have {sel COs) = p4 from ((isc 
pi 1) (isc p2 2) (isc p3 4) (isc p4 3)). The first argument of swap occurs in 
(isc p3 3) of the parent node. The position to be selected - p4 - is related 
to (isc p4 3) in the child node. That is, when applying swap, the first posi- 
tion is given and the second position must be obtained somehow. A modified 
swap-function which deals with selecting the “right” position to swap must 
provide the following characteristics: 

— Because the overall structure of the plan indicates a list problem, the goal 
predicates must be processed in a fixed order which can be derived from 
the sequence of actions in the plan. For a given list of goal predicates, deal 
with the first of these predicates. 

For example, the goal predicates can be (isc pi 1) (isc p2 2) (isc p3 3). 
The first element to be considered is (isc pi 1). 

— A generalized swap * operator works on the current goal predicate and the 
current situation s. 

For example, swap* realizes (isc pi 1) \n situation s = (isc pi 4) (isc p2 
1) (isc p3 2) (isc p4 3). 

— The current goal is splitted into the current position and the element for 
this position. 

For example, (pos (isc pi 1)) = pi and (key (isc pi 1)) = 1- 

— The element currently at position (pos (isc x y)) must be swapped with 
the element which should be at this position, that is (key (isc x y)). 

For s = (isc pi 4) (isc p2 1) (isc p3 2) (isc p4 3) results that position pi 
and position p2 are swapped. 

Constructing this hypothesis presupposes, that the plan-transformation 
system has predefined knowledge about how to access elements in lists. 
Written as Lisp-functions, the general hypothesis is: 

In the current implementation we provide this knowledge only partially. For 
example, we can select an element x £ arg from a list (p arg), as needed to 
construct the definition of succ for sequences. That is, we can construct pos and 
key. For constructing the definition for sel, we need additionally the selection of 
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(defun swap* (fp s) 

(pswap (ppos fp) (sel (pkey fp) s) s) ) 

(defun sel (k s) 

(ppos (car 

(mapcan # ’ (lambda(x) (and (equal (pkey x) k) (list x))) s) ))) 

where pswap is the swap function predefined in the domain specification and 
ppos and pkey select the second/third element of a list (isc x y)d^ 

The generalized swap '^-function is tested for each swap-action occurring 
in the plan. Because it holds for all cases, plan-transformation can proceed. 
If the hypothesis would have failed, a new hypothesis had to be constructed. 
If all hypotheses fail, plan-transformation fails. For the selsort plan, only the 
hypothesis introduced above is possible. 

The rest of plan-transformation is straight-forward: The regularized tree 
is reduced to a single path, unifying branches by replacing the swap action by 
swap* (see fig. 8.17). Note, that in contrast to the rocket problem, this path 
already contains “id” branches which will constitute the “then” case for the 
nested conditional to which the plan is finally transformed. Note further, that 
only the first argument of swap* is given, because the second argument s is 
given by the sub-tree following the action. This is the same way to represent 
plans as terms as used in the sections before (for unstack and rocket) . 




Fig. 8.17. Introduction of a “Semantic” Selector Function in the Regularized Tree 



Data type introduction works as described for rocket (for set and struc- 
tural list problems): a complex object CO = ((isc pi 1) (isc p2 2) (isc p3 

an element of a list of literals which follows a given pattern. This can be realized 
with the /liter-function mapcan. But up to know, we did not implement such 
complex selector functions. 

The second condition (list x) for mapcan is due to the definition of this functor 
in Lisp. Without this condition, the functor returns T or nil, with this condition, 
it returns a list of all elements fulfilling the filter-condition (equal (pkey x) k). 
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3)) and a generalized predicate (isc* CO) for the bottom-test is inferred. The 
state-nodes are rewritten using the rest-selector tail on CO. The head selector 
is introduced in the actions: (swap* (head (taih CO)). The resulting recursive 
function is: 

(defun pselsort (pi s) 

(if (isc* pi s) 
s 

(swap* (head pi) 

(pselsort (tail pi) s) 

) )) 

Note, that head and tail here correspond to last and butlast. But, because 
lists are represented as explicit position-key pairs, pselsort transforms lists 
s to lists sorted according to the specification derived from the top-level goals 
given in pi independent of the transformation sequence! The Lisp-program 
realizing selection sort is given in figure 8.18. 

8.5.3 Concluding Remarks on List Problems 

List sorting is an example for a domain which involves not only structural but 
also semantic knowledge: While for structural problems, the elements of the 
input-data can be replaced by variables with arbitrary interpretation, this is 
not true for semantic problems. For example, reversing a list [x, y, z] works in 
the same way, regardless whether this list is [1, 2, 3J or [1, 1, IJ. In semantic 
problems, such as sorting, operators comparing elements {x < y or x = y) 
are necessary. A prototypical example for a problem involving semantic is 
the member function which returns true, if an element x is contained in a list 
I and false otherwise. Here an equality test must be performed. A detailed 
discussion of the reasons why the member function cannot be inferred using 
standard techniques of inductive logic programming is given by Le Blanc 
(1994). 

Transforming the selsort plan into a finite program involved to critical 
steps: (1) extracting a suitable minimal spanning tree from the plan and (2) 
introducing a “semantic” selector function. The inferred complex object rep- 
resents the number of elements which are already on the goal position. This is 
in analogy to the roeket problem, where the complex object represented how 
many objects are already at the goal-destination (at location B for unload- 
all or inside the rocket for load-all). Plan transformation results in a final 
program which can be generalized to a recursive sort function sharing crucial 
characteristics with selection sort. But the function inferred by our system 
differs from standard selection sort in two aspects: First, the recursion is lin- 
ear, involving a “goal” stack. The nested for-loops (two tail-recursions) of 
the standard function are realized by a single linear recursive call. Of course, 
the function for selecting the current position is itself a loop: the literal list 
is searched for a literal corresponding to a given pattern by an higher-order 
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; Complete Recursive Program for SelSort 
; for lists represented as literals 

; pi is inferred complex object, e.g., ((pi 1) (p2 2) (p3 3)) 

; s is situation (statics can be omitted) , 

; 6-g- ((isc pi 3) (isc p2 1) (isc p3 4) (isc p4 2)) 

(defun pselsort (pi s) 

(if (isc* pi s) 
s 

(swap* (head pi) 

(pselsort (tail pi) s) 

))) 

(defun swap* (fp s) 

(pswap (ppos fp) (sel (pkey fp) s) s) ) 

(defun sel (k s) 

(spos (car 

(mapcan #’(lambda(x) (and (equal (skey x) k) (list x))) s) 

))) 

(defun isc* (pi s) 

(subsetp pi (mapcar #’(lambda(x) (cdr x)) s) :test ’equal)) 

; selectors for elements of pi (p k) 

(defun ppos (p) (first p)) 

(defun pkey (p) (second p)) 

; selectors for elements of s (isc p k) 

(defun spos (p) (second p)) 

(defun skey (p) (third p)) 

; head and tail realized as last and butlast 

; (from the order defined in the plcui, alternatively: car cdr) 
(defun head (1) (car (last 1))) 

(defun tail (1) (butlast 1)) 



; explicit implementation of add-del effect 

; in connection with DPlain: application of swap-operator on s 
; "inner" union: i=j case 
(defun pswap (i j s) 

(print '(swap ,i ,j ,s)) 

(let ((ikey (skey (car (remove-if #’(lambda(x) (not (equal i (spos x)))) s)))) 
(jkey (skey (car (remove-if #’(lambda(x) (not (equal j (spos x)))) s)))) 
(rsts (remove-if #’(lambda(x) (or (equal i (spos x)) 

(equal j (spos x)))) s)) 

) 



)) 



(union (union (list (list ’isc i jkey)) 

(list (list ’isc j ikey)) :test ’equal) 
rsts :test ’equal) 



Fig. 8.18. LISP-Program for SelSort 



filter function. Second, it does not rely on the ordering of natural numbers, 
but generates the desired sequence of elements, explicitely given as input. 

We demonstrated plan transformation of a plan for lists with four ele- 
ments. From a list with three elements, evidence for the hypothesis for gener- 
ating the semantic selector function would have been weaker (involving only 
the actions from level one to two in the regularized tree). An alternative 
approach to plan transformation, involving knowledge about numbers, is de- 
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scribed for a plan for three-element lists in Wysotzki and Schmid (2001). In 
general, there are three backtrack-points for plan transformation: 

— Generating “semantic” functions: 

If a generated hypothesis for the semantic function fails or if generaliza- 
tion-to-n fails, generate another hypothesis. 

~ Extracting a minimal spanning tree from a plan: 

If plan transformation or generalization-to-n fails, select another minimal 
spanning tree. 

~ Number of objects involved in planning: 

If plan transformation or generalization-to-n fails, generate a plan, involv- 
ing an additional object. (A plan must to involve at least three objects 
to identify the recursive parameter and its substitution as described in 
chapter 7. 

Because plan construction has exponential effort (all possible states of a do- 
main for a fixed number of objects have to be generated and the number of 
states can grow exponentially relative to the number of objects) and because 
the number of minimal spanning trees might be enormous, generating a fi- 
nite program suitable for generalization is not efficient in the general case. 
To reduce backtracking effort, we hope to come up with a good heuristic for 
extracting a “suitable” minimal spanning tree in the future. One possibility 
mentioned above is, to try to combine tree extraction and regularization. 



8.6 Plans over Complex Data Types 

8.6.1 Variants of Complex Finite Programs 

The usual way, to classify recursive functions, is to divide them into different 
complexity classes (Hinman, 1978; Odifreddi, 1989). In complexity theory, the 
semantics of a recursive function is under investigation. For example, fibonacci 
is typically implemented as tree-recursion (see tab. 8.10), but it belongs to 
the class of linear problems - meaning, the fibonacci-number of a number 
n can be calculated by a linear recursive function (Field & Harrison, 1988, 
pp. 454). In our approach to program synthesis, complexity is determined 
by the syntactical structure of the finite program, based on the structure of 
a universal plan. The unfolding (see chap. 7) of all functions in table 8.10 
results in a tree structure. Interpretation of max always involves only one 
of the two tail-recursive calls (that is, the function is linear). Interpretation 
of fib results in two new recursive calls for each recursive-step (resulting in 
an effort 0(2")). The Ackermann- function (ack) is the classic example for 
a non-primitive recursive function with exponential growth - each recursive 
call results in y -|- a: new recursive calls. 

For plan transformation, on the other hand, semantics is taken into ac- 
count to some extend: As we saw above, plans are linearizable if the data 
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Table 8.10. Structural Complex Recursive Functions 

Alternative Tail Recursion 
(max m 1) == (if (null 1) m 

(if (> (head 1) m) (max (head 1) (tail 1)) (max m (tail 1)))) 

Tree Recursion 

(fib x) == (if (= 0 x) 0 (if (= 1 x) 1 (plus (fib (- x 1)) (fib (- x 2))))) 
/r-Recursion 

(ack X y) == (if(= 0 x)(l+ y)(if (= 0 y)(ack (1- x) l)(ack (1- x)(ack x (1- y))))) 



type underlying the plan is a set or a list. For the case of list problems in- 
volving semantic attributes of the list elements, it depends on the complexity 
of the involved “semantic” functions whether the resulting recursion is linear 
or more complex. Currently, we do not have a theory of “linearizability” of 
universal plans, but clearly, such a theory is necessary to make our approach 
to plan transformation more general. A good starting point for investigating 
this problem, should be the literature on the transformational approach to 
code optimization in functional programming (Field & Harrison, 1988). In 
Wysotzki and Schmid (2001) linearizability is discussed in detail. 

There are two well-known planning domains, for which the underlying 
data type is more complex than sets or lists: Tower of Hanoi (sect. 3. 1.4. 4 in 
chap. 3) and building a Tower of alphabetically sorted blocks in the blocks- 
world domain (sect. 3. 1.4. 5 in chap. 3). The tower problem is a set of lists 
problem and used as one of the benchmark problems for planners. The hanoi 
problem is a list of lists problem (the “outer” list is of length 3 for the 
standard 3 peg problems). For both domains, the general solution procedure 
to transform an arbitrary state into a state fulfilling the top-level goals is - 
at least at first glance - more complex than a single linear recursion. Up to 
now we cannot fully automatically transform plans for such complex domains 
into finite programs. In the following, we will discuss possible strategies. 

8.6.2 The ‘Tower’ Domain 

8. 6. 2.1 A Plan for Three Blocks. The specification of the three-block 
tower problem was introduced in chapter 2 and is described in section 3. 1.4. 5 
in chapter 3. The unstack/ clearhlock domain described above as example for a 
problem with underlying sequential data type is a sub-domain of this problem: 
the puttable operator is structurally identical to the unstack operator. The 
put operator is represented with a conditioned effect for taking care of the 
case that a block x is moved from the table to another block y and for the 
case that x is moved from another block z. 

For the 3-block problem, the universal plan is a unique minimal spanning 
tree (see fig. 3.10). Note, that for a given goal to build a tower with {on(A, B), 
on(B, CJ), the state {on(C, B), on (B, A)} - that is a tower where the blocks 
are sorted in reverse order to the goal - only two actions are needed to reach 
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the goal state: base of the tower C is put on the table, B can immediately 
put on C without putting it on the table first. A program for realizing tower 
is only optimal, if these short-cuts are performed. 

There are two possible strategies for learning a control program for tower 
which are discussed in reinforcement learning under the labels incremental 
elemental-to- composite learning and simultaneous composite learning (Sun & 
Sessions, 1999): In the first case, the system is first trained with a simple 
task, for example clearing an arbitrary block by unstacking all blocks lying 
on top of it. The learned policy is stored, for example as CLi?Ai?-program 
and extends the set of available basic functions. When learning the policy 
for a more complex problem, such as tower, the already available knowledge 
{CLEAR) can be used. In the second case, the system immediately learns 
to solve the complex problem and the decomposition must be performed 
autonomously. The application of both strategies to the tower problem is 
demonstrated in Wysotzki and Schmid (2001). In the work reported here, we 
focus on the second strategy. 

8. 6. 2. 2 Elemental to Composite Learning. Let us assume, that the 
control knowledge for clearing an arbitrary block is already available as a 
CLEARBLOCK function: 

CLEARBLOCK(x, s) = if(clear(x, s), s, puttable(topof(x), CLEAR- 
BLOCK (topof(x), s))). 

This function immediately returns the current state s, if clear(x) already 
holds in s, otherwise it is tried to put the block lying on top of x on the 
table. 

Now we want to learn a program for solving the more complex tower 
problem. In Wysotzki and Schmid (2001) we describe an approach based on 
the assumption of linearizability of the planning problem (see sect. 2. 3. 4. 2 
in chap. 2): It is presupposed that sub-goals can be achieved immediately 
before the corresponding top-level goal is achieved. For example, to reach a 
state where block A is lying on block B, the action put(A, B) can be ap- 
plied; but this action is applicable only if both A and B are clear. That is, to 
realize goal on(A, B) the sub-goals clear(A) and clear(B) must hold in the 
current situation. Allowing the use of the already learned recursive CLEAR- 
BLOCK function and using linearization, the complex plan for solving the 
tower problem for three blocks, can be collapsed to: 

, , , , , , (put a b)o{CLEARBLOCK a)o[CLEARBLOCK b) 

(on a b) (on b c) — > 

, , , {put b c)o{CLE ARB LOCK b)o(CLEARBLOCK c) 

(on b c) > S. 

The CLEARBLOCK program does apply as many puttable actions as neces- 
sary. If a block is already clear, the state is returned unchanged. The recursive 
program generalizing over this plan is given in appendix CC.6. 
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For some domains, it is possible to identify independent sets of predicate 
by analyzing the operator specifications, for example using the TIM-analysis 
(see sect. 2.5.1 in chap. 2) for the planner STAN (Long & Fox, 1999). 

While the approach of Wysotzki and Schmid (2001) is based on analyzing 
the goals and sub-goals contained in a plan, the strategy reported in this 
chapter is based on data type inference for uniform sub-plans (see sect. 8.2.1). 
If you look at the plan given in figure 3.10, the two upper levels contain only 
arcs labelled with put actions and all levels below contain only arcs labelled 
with puttahle actions. Therefore, as demonstrated for rocket above, the plan 
is in a first step decomposed and it becomes impossible to infer a program 
where put and puttahle actions are interwoven. 

8. 6. 2. 3 Simultaneous Composite Learning. Initial plan decomposition 
for the 3-block tower plan results in two sub-plans - a sub-plan for put-all and 
a sub-plan for puttahle-all. The put-all sub-plan is a regular tree as defined 
above. The only level with branching is for actions (put B C) and the plan can 
be immediately reduced to a linear sequence. Consequently, we introduce the 
data type list with complex object CO = (A B C) and bottom-test (on* (A 
B C)). The generalized put-all function is structurally analogous to load-all 
from the rocket domain: 

(put-all olist s) == 

(if (on* olist s) 
s 

(put (first olist) (second olist) (put-all (tail olist) s)) 

) 

where first and second are implemented as last and second-last, or 
olist gives the desired order of the tower in reverse order. 

For the puttahle sub-plan we have one fragment consisting of a single step 

- (puttahle C) - for the reversed tower and a set of four sequences: 

- A<C <B 
- B <C <A 
- B <A<C 

- C <A<B 

with (ct x) as bottom-test and the constructor (succ x) = y for (on y x) 
as introduced above for linear plans. An obvious strategy compatible with 
our general approach to plan transformation would be to select one of this 
sequences and generalize a clear-all function identical to the unstack-all func- 
tion discussed above. It remains the problem of selecting the block which is to 
be unstacked - that is, we have to infer the bottom-element from the goal-set 
(ct a) (ct h) (ct c). As described for sorting above, we have to generate a 
semantic selector function which is not dependent of the parent-node, but of 
the current 

Note, that for selsort we introduced a selector in the basic operator swap. Here 
we introduce a selector in the function clear- all which is already a recursive 
generalization! 
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We have the following examples for constructing the selector function: 

((on b c) (on c a)): (sel (A B C)) = A 

((on a c) (on c b)), ((on c a) (on a b)): (sel (A B C)) = B 

((on b a) (on a c)): (sel (A B C)) = C 

that is, for the complex object (A B C) we always must select the element 

which is the base of the current tower. 

If we model the tower domain by explicit use of an ontahle predicate, this 
predicate can be used as criterium for the selector function. Without this 
predicate, we can introduce (on* CO) - already generated for put-all - and 
select the last element of the list. The resulting tower function than would 
be: 

tower(olist, x) = put-all(olist, clear-all(sel(s))) 
sel(s) = last(make-olist(s)). 

With the described strategy, the problem got reduced to an underlying 
data type list with a semantic selector function. The selector function is se- 
mantic, because it is relevant which block must be cleared. This control rule 
generates correct transformation sequences for towers with arbitrary num- 
bers of blocks with the desired sorting of blocks specified by olist, which is 
generated from the top-level goals. But, it does not for all cases generate the 
optimal transformation sequences! 

For generating optimal transformation sequences, we must cover the cases 
where a block can be put onto another block immediately, without putting 
it on the table first. For the three-block plan, there is only one such case and 
we could come up with the discriminating predicate (on c b)\ 
tower(olist, x) = put- all (olist, if on(c, b, s), 

puttable(c, s), 
clear-all(sel(s))) 

which generates incorrect plans for larger problems, for example for the state 
((on c h) (on b a) (on a d) (ct c))\ 

For both variants of tower a generate-and-test strategy would discover 
the flaw: For the first variant, it would be detected that for ((on c b) (on 
b a) (ct a)) an additional action (puttable b) would be generated which is 
not included in the optimal universal plan. For the second variant, all states 
of the 3-block plan are covered correctly - the faulty condition would only 
be detected when checking larger problems. But, with only one special case 
of a reversed tower in the three-block plan every other hypothesis would be 
highly speculative. Therefore, we now will investigate the four-block plan. 

The universal plan for the four-block tower problem is a DAG with 73 
nodes and 78 edges, thus we have to extract a suitable minimal spanning tree. 
Because the plan is rather larger, we present an abstract version in figure 8.19 
and a summary for the action sequences for all 33 leaf nodes in table 8.11. 

For the four-block problem, we have 15 sequences needing to put all blocks 
on the table and 8 cases with shorter optimal plans (only counting leaf nodes) 
- in contrast to 5 to 1 cases for the 3-block tower. Additionally, we have not 



264 8. Transforming Plans into Finite Programs 




Fig. 8.19. Abstract Form of the Universal Plan for the Four-Block Tower 



only one possible partial tower (with 2 or more blocks stacked) but also a 
two-tower case (with two towers consisting of two blocks). Only one path 
in the plan makes it necessary to interleave put and puttahle: if D is on top 
and C immediately under it. There are four cases, where puttable has to be 
performed only two times before the put actions are applied and one case, 
where puttable has to be performed only once. For one case, only two puttables 
and two puts have to be applied. For the case of pairs of towers, all three put- 
actions have to be performed for each leaf, puttable has to performed once or 
twice. 

The underlying data type is now not just a list, but a more complicated 
structure, where for example (D C A B) < (DAB C)\ Again, the order 
is derived from the number of actions needed to transform a state into the 
goal state and a tower (D C A B) can be transformed into the goal using 
two puttable actions and three put actions while for (DA B C) three puttable 
actions and three put actions are needed (see tab. 8.11). Currently, we do 
not see an easy way to extract all conditions for generating optimal action 
sequences from the universal plan. Either, we have to be content with the 
correct but suboptimal control rules inferred from the three-block plan, or we 
have to rely on incremental learning. A program which covers all conditions 
for generating optimal plans is given in appendix CC.6. 

One path, we want to investigate in the future is, to model the domain 
specification in a slightly different way - using only a single operator (put 
block loc) where loc can be a block or the table. This makes the universal 
plan more uniform. There is no longer the decision to take, which operator 
to apply next. Instead, the decision whether a block is put on another block 
or on the table can be included in the “semantic” selector function. 

8. 6. 2. 4 Set of Lists. Some deeper insight in the structure of the tower 
problem might be gained by analyzing the analogous abstract problem. The 
sequence of number of states in dependence of the number of blocks is given in 
table 2.3. This sequence corresponds to the number of sets of lists: a(n)=(2n- 
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Table 8.11. Transformation Sequences for Leaf-Nodes of the Toruer Plan for Four 
Blocks 



15 4-towers, needing 3 puttable actions 

(PUTTABLE A) (PUTTABLE B) (PUTTABLE D) (PUT C D) (PUT B C) 

(PUTTABLE A) (PUTTABLE C) (PUTTABLE B) (PUT C D) (PUT B C) 

(PUTTABLE A) (PUTTABLE C) (PUTTABLE D) (PUT C D) (PUT B C) 

(PUTTABLE A) (PUTTABLE D) (PUTTABLE B) (PUT C D) (PUT B C) 

(PUTTABLE B) (PUTTABLE A) (PUTTABLE D) (PUT C D) (PUT B C) 

(PUTTABLE B) (PUTTABLE C) (PUTTABLE A) (PUT C D) (PUT B C) 

(PUTTABLE B) (PUTTABLE C) (PUTTABLE D) (PUT C D) (PUT B C) 

(PUTTABLE B) (PUTTABLE D) (PUTTABLE A) (PUT C D) (PUT B C) 

(PUTTABLE C) (PUTTABLE A) (PUTTABLE B) (PUT C D) (PUT B C) 

(PUTTABLE C) (PUTTABLE A) (PUTTABLE D) (PUT C D) (PUT B C) 

(PUTTABLE C) (PUTTABLE B) (PUTTABLE A) (PUT C D) (PUT B C) 

(PUTTABLE C) (PUTTABLE B) (PUTTABLE D) (PUT C D) (PUT B C) 

(PUTTABLE D) (PUTTABLE A) (PUTTABLE B) (PUT C D) (PUT B C) 

(PUTTABLE D) (PUTTABLE B) (PUTTABLE A) (PUT C D) (PUT B C) 

(PUTTABLE C) (PUTTABLE D) (PUTTABLE A) (PUT C D) (PUT B C) 

((put c d) (puttable a) also possible) 

6 4-towers, needing 2 puttable actions 



( (on 


a b) (on 


b d) (on 


d c) 


(ct 


a)) 


( (on 


a c) (on 


c b) (on 


b d) 


(ct 


a)) 


( (on 


a c) (on 


c d) (on 


d b) 


(ct 


a)) 


( (on 


a d) (on 


d b) (on 


b c) 


(ct 


a)) 


( (on 


b a) (on 


a d) (on 


d c) 


(ct 


b)) 


( (on 


b c) (on 


c a) (on 


a d) 


(ct 


b)) 


( (on 


b c) (on 


c d) (on 


d a) 


(ct 


b» 


( (on 


b d) (on 


d a) (on 


a c) 


(ct 


b)) 


( (on 


c a) (on 


a b) (on 


b d) 


(ct 


c» 


( (on 


c a) (on 


a d) (on 


d b) 


(ct 


c)) 


( (on 


c b) (on 


b a) (on 


a d) 


(ct 


c)) 


( (on 


c b) (on 


b d) (on 


d a) 


(ct 


c» 


( (on 


d a) (on 


a b) (on 


b c) 


(ct 


d)) 


( (on 


d b) (on 


b a) (on 


a c) 


(ct 


d)) 


( (on 


c d) (on 


d a) (on 


a b) 


(ct 


c)) 



( (on a d) (on d c) (on c b) (ct a)) 
( (on b d) (on d c) (on c a) (ct b)) 
( (on c d) (on d b) (on b a) (ct c)) 
( (on d a) (on a c) (on c b) (ct d)) 
( (on d b) (on b c) (on c a) (ct d)) 
( (on d c) (on c a) (on a b) (ct d)) 



2 4-towers, needing 4 actions 
( (on b a) (on a c) (on c d) (ct b)) 

( (on d c) (on c b) (on b a) (ct d)) 

(sorted tower, 0 actions is root of plan) 

5 2-tower pairs, needing 2 puttable actions 



(PUTTABLE A) (PUTTABLE D) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE B) (PUTTABLE D) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE C) (PUTTABLE D) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUTTABLE A) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUTTABLE B) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUT C D) (PUTTABLE A) (PUT B C) (PUT A B) 
((put c d) BEFORE (puttable a)!) 



(PUTTABLE B) (PUTTABLE A) (PUT B C) (PUT A B) 
(PUTTABLE D) (PUT C D) (PUT B C) (PUT A B) 



(PUTTABLE B) (PUTTABLE A) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUTTABLE A) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE A) (PUTTABLE B) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUTTABLE B) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUTTABLE A) (PUT C D) (PUT B C) (PUT A B) 

((put c d) (puttable a) also possible) 

5 2-tower pairs, needing 1 puttable actions 



( (on a c) (on b d) (ct a) (ct b)) 
( (on a c) (on d b) (ct a) (ct d)) 
( (on a d) (on b c) (ct a) (ct b)) 
( (on b c) (on d a) (ct b) (ct d)) 
( (on a b) (on d c) (ct a) (ct d)) 



(PUTTABLE A) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE B) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUT C D) (PUT B C) (PUT A B) 

(PUTTABLE D) (PUT C D) (PUT B C) (PUT A B) 

2 2-tower pairs, needing no (put c d) action 

( (on a b) (on c d) (ct a) (ct c)) (PUTTABLE A) (PUT B C) (PUT A B) 

( (on b a) (on c d) (ct b) (ct c)) (PUT B C) (PUT A B) 

are no leafs) 



( (on a d) (on c b) (ct a) (ct c)) 
( (on b a) (on d c) (ct b) (ct d)) 
( (on b d) (on c a) (ct b) (ct c)) 
( (on c a) (on d b) (ct c) (ct d)) 
( (on c b) (on d a) (ct c) (ct d)) 



(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 
(PUT A B) 



l)a(n-l) - (n-l)(n-2)a(n-2). For n > 1 it is the row sum of the “unsigned 
Lah-triangle” (Knuth, 1992). The corresponding formula is exp{x / {1 — x)) 
The tower problem is related to generating the power-set of a list with 
mutually different elements (see tab. 8.12). But there is also a difference 
between the two domains: For powerset each element of the set is a set again, 
that is, for example {{a}, {b, c}, {&}} is equal to {{a}, {c, b}, {&}}. In 
contrast, for tower, the elements of the sets are lists. For example {(a), (b, 
c), (b)} is equal to {(b), (a), (b, c)} but not to (c, b), (b)}. A program 

generating all different sets of lists (that is towers) can be easily generated 
from powerset by changing :test ’setequal to :test 'equal in ins-el. 
The tower domain is the inverse problem to set of lists: For sets of lists a single 
list is decomposed in all possible partial lists. For tower each state corresponds 
to a set of partial lists and the goal state is the set containing a single list 
with all elements in a fixed order. The (puttable x) operator corresponds to 
removing an element from a list and generating a new one-element list (cons 

The identification of the sequence was researched by Bernhard Wolf. More back- 
ground information can be found at http://www.research.att.com/cgi-bin/ 
access . cgi/as/njas/sequences/eisA . cgi?Anmn=000262. 
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(car 1) nil), the (put x y) operator corresponds to removing an element 
from a list and putting it in front of another list (cons (car 11) 12). A 
program generating a list of sorted numbers is given in appendix CC.6. 



Table 8.12. Power- Set of a List, Set of Lists 

(defun powerset (1) (pset 1 (1+ (length 1)) (list (list nil)))) 
(defun pset (1 c ps) 

(cond ((= 0 c) nil) 

(T (union ps (pset 1 (1- c) (ins-el 1 ps)))) )) 

(defun ins-el (1 ps) 

(cond ((null 1) nil) 

(T (union (mapcar # ’ (lambda(y) (adjoin (car 1) y)) ps) 
(ins-el (cdr 1) ps) :test ’setequal)) )) 

; for set of lists :test ’equal 
(defun setequal (si s2) (and (subsetp si s2) (subsetp s2 si))) 



8. 6. 2. 5 Concluding Remarks on ‘Tower’. Inference of generalized con- 
trol knowledge for the tower domain was investigated also in the context of 
two alternative approaches. One of these approaches is genetic programming 
(sect. 6.3.2 in chap. 6). Within this approach, given primitive operators of 
some functional programming language together with rules for the correct 
generation of terms, for a set of input/output examples and an evaluation 
function (representing knowledge about “good” solutions) a program cov- 
ering all I/O examples correctly is generated by search in the “evolution 
space” of programs. Programs generated by this approach are given in figure 
6.4. These programs correspond to the “linear” program discussed above. Be- 
cause always first all blocks are put on the table and afterwards the tower is 
constructed, the program does not generate optimal transformation sequences 
for all possible cases. 

The second approach, introduced by Martin and Geffner (2000), infers 
rules from plans for sample input states (see sect. 2.5.2 in chap. 2. The domain 
is modelled in a concept language (AI knowledge representation language) 
and the rules are inferred with a decision list learning approach. The resulting 
rules are given in table 8.13. For example, rule A3 represents the knowledge, 
that a block should be picked up if it’s clear, and if its target block is clear 
and “well-placed”. With these rules, 95.5% of 1000 test problems were solved 
for 5-block problems and 72.2% of 500 test problems were solved for 20-block 
problems. The generated plans are about two steps longer than the optimal 
plans. The authors could show, that after a selective extension of the training 
set by the input states for which the original rules failed to generate a correct 
plan, a more extensive set of rules is generated for which the generated plans 
are about one step longer than the optimal plans. 

Our approach differs from these two approaches in two aspects: First, we 
do not use example sets of input/output pairs or of input/plan pairs but 
we analyze the complete space of optimal solutions for a problem of small 
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Table 8.13. Control Rules for Tower Inferred by Decision List Learning 

Al: PUT-ON {{orig = oria) A (yon^^ .holding) A clears) 

A2: PUT-ON-TABLE (holding) 

A3: PICK ((yon*g.(ong = ons)) A {\/ cm g. clear s) A clears) 

A4: PICK (-^(on*g = on*s) A (y ons .{yong^ .clears)) A clears) 

A5: PICK {-i(ong = ons) A (Von *g .{on* = ons)) A clears) 

A6: PICK (-n(ong = oris) A (Vons.(VonJ^.cZears))) 

size. Second, we do not rely on incremental hypothesis-construction, that is, 
a learning approach where each new example is used to modify the current 
hypothesis if this example is not covered in the correct way. Instead, we 
aim at extracting the control knowledge from the given universal plan by 
exploiting the structural information contained in it. Although we failed up 
to now to generate optimal rules for tower, we could show for sequence, set, 
and list problems, that with our analytical approach we can extract correct 
and optimal rules from the plan. 

There is a trade-off between optimality of the policy versus (a) the ef- 
ficiency of control knowledge application and (b) the efficiency of control 
knowledge learning. As we can see from the program presented in appendix 
CC.6 and from the (still non-optimal!) control rules in table 8.13, generat- 
ing minimal action sequences might involve complex tests which have to be 
performed on the current state. In the worst case, these tests again involve 
recursion, for example, a test, whether already a “well-placed” partial tower 
exists (test subtow in our program). Furthermore, we demonstrated, that the 
suboptimal control rules for tower could be extracted quite easily from the 
3-block plan, while automatic extraction of the optimal rules from the 4-block 
plan involves complex reasoning (for generating the tests for “special” cases) . 

8.6.3 Tower of Hanoi 

Up to now, we did not investigate plan transformation for the Tower of Hanoi. 
Thus, we will make just some more general remarks about this domain. Tower 
of Hanoi can be seen as a modified tower problem: In contrast to tower, 
where blocks can be put on arbitrary positions on a table, in Tower of Hanoi 
the positions of discs are restricted to some (typically three) positions of 
pegs. This results in a more regular structure of the DAG than for the tower 
problem and therefore, we hope that if we come up with a plan transformation 
strategy for tower, the Tower of Hanoi domain is covered by this strategy as 
well. 

It is often claimed, that hanoi is a highly artificial domain, and that 
the only isomorphic domains are hand-crafted puzzles, as for example the 
monster problems (Simon & Hayes, 1976; Clement & Richard, 1997). I want 
to point out, that there are solitaire (“patience”) games, which are isomorphic 
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to hanoi}^ One of these solitaire-games (freecell) was included in the AIPS- 
2000 planning competition. 

The domain specification for hanoi is given in table 3.8. The resulting plan 
is a unique minimal spanning tree, which is already regular (see fig. 3.9). This 
indicates, that data type inference and resulting plan transformation should 
be easier than for the tower problem. While hanoi with three discs contains 
more states than the three-block tower domain (27 to 13) the actions for 
transforming one state into another are much more restricted. The number 
of states for hanoi is 3". The minimal number of moves when starting with a 
complete tower on one peg is 2"“^. Up to now, there seems to be no general 
formula to calculate the minimal number of moves for an arbitrary starting 
state - that is, one of the nodes of the universal plan.^® 

Tower of Hanoi is a puzzle investigated extensively in artificial intelligence 
as well as in cognitive psychology since the 60ies. In computer science classes. 
Tower of Hanoi is used as a prototypical example for a problem with expo- 
nential effort. Coming up with efficient algorithms (for restricted variants) of 
the Tower of Hanoi problem is still ongoing research (Atkinson, 1981; Pet- 
torossi, 1984; Walsh, 1983; Allouche, 1994; Hinz, 1996). As far as we survey 
the literature, all algorithms are concerned with the case, where a tower of 
n discs is initially located at a predefined start peg (see for example table 
8.14). In general, hanoi is /x-recursive already for the restricted state where 

Table 8.14. A Tower of Hanoi Program 

; (SETQ A ’ (1 2 3) B NIL C NIL) (start) DR 

; (hcinoi ’(1 2 3) nil nil 3) 

(DEFUN move (from to) 

(COND ( (NULL (EVAL from)) (PRINT (LIST >PEG from ’EMPTY)) ) 

( (OR (NULL (EVAL to)) 

(> (CAR (EVAL to)) (CAR (EVAL from)) )) 

(SET to (CONS (CAR(EVAL from)) (EVAL to)) ) 

(SET from (CDR (EVAL from)) ) 

) 

( T (PRINT (LIST ’MOVE ’FROM (CAR(EVAL from)) 

’TO (CAR(EVAL to)) ’NOT ’POSSIBLE))) 

) 

(LIST(LIST ’MOVE ’DISC (CAR (EVAL to)) ’FROM from ’TO to)) ) 

(DEFUN hanoi (from to help n) 

(COND ( (= n 1) (move from to) ) 

( T ( APPEND 

(hanoi from help to (- n 1)) 

(move from to) 

(hcinoi help to from (- n 1)) 

) ) ) ) 

(DEFUN start () (hanoi ’A ’B ’C (LENGTH A))) 

We plan to conduct a psychological experiment in the domain of problem solv- 
ing by analogy, demonstrating, that subjects who are acquainted with playing 
patience games perform better on hanoi than subjects with no such experience, 
see: http://forum.swarthinore.edu/epigone/geometry-puzzles/ 

twimclehmeh/7oen0r212cwy@forum. swarthmore . edu, open question from 
Februar 2000 
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the initial state is fixed and only the number of discs are variable with the 
structure hanoiomoveohanoi. A standard implementation, as showir in table 
8.14 is as tree-recrirsioir. 

We are interested iir learning a control strategy starting with an arbitrary 
initial state (see program in table 8.15). 



Table 8.15. A Tower of Hanoi Program for Arbitrary Starting Constellations 

(DEFUN ison (disc peg) 

(CDND ( (NULL peg) NIL) 

( (= (CAR peg) disc) T) 

( T (ison disc (cdr peg))))) 

(DEFUN on (disc from to help) 

(COND ( (ison disc (eval from)) from) 

( (ison disc (eval to)) to) 

( T help))) 

; whichpeg: peg on which the current disc is NOT lying and peg which 
; is not current goal peg 
(DEFUN whichpeg (disc peg) 

(COND ( (or (and (equal (on disc ’A ’B ’C) ’B) (equal peg ’C)) 

(cind (equal (on disc ’A ’B ’C) ’C) (equal peg ’B)) ) ’A) 

( (or (and (equal (on disc ’A ’B ’C) ’A) (equal peg ’C)) 

(and (equal (on disc ’A ’B ’C) ’C) (equal peg ’A)) ) ’B) 

( (or (cind (equal (on disc ’A ’B ’C) ’A) (equal peg ’B)) 

(and (equal (on disc ’A ’B ’C) ’B) (equal peg ’A)) ) ’C) )) 

(DEFUN topof (peg) 

(COND ( (null (car (eval peg))) nil) ( T (car (eval peg))) )) 

(DEFUN clearpeg (peg) 

(COND ( (null (car (eval peg))) T) ( T nil) )) 

(DEFUN cleartop (disc) 

(COND ( (and (equal (on disc ’A 'B ’C) ’A) (= (car A) disc)) T) 

( (and (equal (on disc ’A ’B ’C) ’B) (= (car B) disc)) T) 

( (and (equal (on disc ’A ’B ’C) ’C) (= (car C) disc)) T) 

( T nil))) 

(DEFUN gmove (disc peg) 

(CDND ( (= disc 0) (PRINT (LIST ’NO ’DISC))) 

( (equal (on disc ’A ’B ’C) peg) 

(PRINT (LIST ’Disc disc ’IS ’ON ’PEG peg)) ) 

( (OR (clearpeg peg) (> (topof peg) disc)) 

(PRINT (LIST ’MOVE ’DISC disc 

’FROM (on disc ’A ’B ’C) 

’TO peg ) ) 

(SET (on disc ’A ’B ’C) (CDR (eval (on disc ’A ’B ’C)))) 

(SET peg (CONS disc (EVAL peg))) 

) 

( T (PRINT (LIST ’MOVE ’FROM disc ’ON peg ’NOT ’POSSIBLE))))) 

(DEFUN ghanoi (disc peg) 

(COND ( (and (= disc 1) (equal (on disc ’A ’B ’C) peg)) T ) 

( T (CDND 

( (equal (on disc ’A ’B ’C) peg) (ghanoi (- disc 1) peg) ) 

( (and (not (equal (on disc ’A ’B ’C) peg)) 

(not (and (cleartop disc) (clearpeg peg))) 

(> disc 1)) (ghcinoi (- disc 1) (whichpeg disc peg)) ) 

) 

(gmove disc peg) 

(COND ((> disc 1) (ghanoi (- disc 1) peg))) ))) 

(DEFUN n-of -discs (pi p2 p3) (+ (LENGTH pi) (+ (LENGTH p2) (LENGTH p3)))) 

; ghanoi: "largest" Disc x Goal-Peg — > Solution Sequence 
(DEFUN gstart () (ghcinoi (n-of-discs ABC) ’O) 
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9.1 Combining Planning and Program Synthesis 

We demonstrated that planning can be combined with inductive program 
synthesis by first generating a universal plan for a problem with a small 
number of objects, then transforming this plan into a finite program term, 
and folding this term into a recursive program scheme. While planning and 
folding can be performed by powerful, domain-independent algorithms, plan 
transformation is knowledge dependent. In part I, we presented the domain- 
indpendent universal planner DPlan. In this part, we presented an approach 
to folding finite program terms based on pattern-matching which is more 
powerful than other published approaches. 

Our approach to plan transformation, presented in the previous chap- 
ter, is based on inference of the data type underlying the planning domain. 
Typically, such knowledge is pre-defined in program synthesis, for example, 
as domain-axiom -- as in the deductive approach of Manna and Waldinger 
(1975) - or as inherent restriction of the input domain - as in the inductive 
approach of Summers (1977). We go beyond these approaches, providing a 
method to infer the data type from the structure of the plan where inference 
is based on a set of predefined abstract types. Furthermore, we presented first 
ideas for dealing with problems relying on semantic knowledge. In program 
synthesis, typcially, only structural list problems (such as reverse) are consid- 
ered. Planning problems which can be solved by using structural knowledge 
only are for example clearblock and rocket. In clearblock, relations between ob- 
jects {on(x, y)) are independent of attributes of these objects (as their size). 
In rocket, relations between objects are irrelevant. In contrast, sorting lists 
depends on a semantic relation between objects, that is, whether one number 
is greater than another. If a universal plan has parallel branches representing 
the same operation but for different objects, we assume that a semantic se- 
lector function must be introduced. The selector is constructed by identifying 
discriminating literals between the underlying problem states (see sect. 8.5 in 
chap. 8). To sum up, with our current approach we can deal with structural 
problems in a fully automatic way from planning over plan transformation 
to folding. Dealing with problems relying on semantic information is a topic 
for further research. 
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Currently we are not exploiting the full power of the folder when gen- 
eralizing over plans: In plan transformation, it is tried to come up with a 
single, linear structure as input to the folder although the folder allows to 
infer sets of recursive equations with arbitrarily complex recursive structures. 
In future research we plan to investigate possibilities for submitting complex, 
non-linear plans directly to the folder. 

A drawback of using planning as basis for constructing finite programs is 
that number problems, such as factorial or fibonacci, cannot be dealt with 
in a natural way. For such problems, our folder must rely on input traces 
provided by a user. Using a graphical user interface to support the user in 
constructing such traces is discussed by Schrodl and Edelkamp (1999). 



9.2 Acquisition of Problem Solving Strategies 

The most important building-stones for the flexible and adaptive nature of 
human cognition are powerful mechanisms of learning.^ On the low-level end 
of learning mechanisms is stimulus-response learning, mostly modelled with 
artificial neural nets or reinforcement learning. On the high-level end are 
different principles of induction, that is, generalizing rules from examples. 
While the majority of work in machine learning focusses on induction of 
concepts (classification learning, see sect. 6. 3. 1.1 in chap. 6), our work focusses 
on inductive learning of cognitive skills from problem solving. While concepts 
are mostly characterized as declarative knowledge (know what) which can be 
verbalized and is accessible for reasoning processes, skills are described as 
highly automated proceditra/ knowledge (know how) (Anderson, 1983). 

9.2.1 Learning in Problem Solving and Planning 

Problem solving is generally realized as heuristic search in a problem space. 
In cognitive science most work is in the area of goal driven production sys- 
tems (see sect. 2. 4. 5.1 in chap. 2). In AI, different planning techniques are 
investigated (see sect. 2.4.2 in chap. 2). In both frameworks, the definition 
of problem operators together with conditions for their application - that is, 
production rules ~ is central. Matching, selection, and application of opera- 
tors is performed by an interpreter (control strategy) : the preconditions of all 
operators defined for a given domain are matched against the current data 
(problem state), one of the matching operators is selected and applied on the 
current state. The process terminates if the goal is fulfilled or if no operator 
is applicable. 

Skill acquisition is usually modelled as composition of predefined primi- 
tive operators as result of their co-occurrence during problem solving, that is, 

^ This section is a short version of the previous publications Schmid and Wysotzki 
(1996) and Schmid and Wysotzki (2000c). 
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learning by doing. This is true in cognitive modelling (knowledge compilation 
in ACT Anderson & Lebiere, 1998), (operator chunking in Soar Rosenbloom 
& Newell, 1986) as well as in AI planning (macro learning, see sect. 2.5.2 
in chap. 2). Acquisition of such “linear” macros can result in a reduction of 
search, because now composite operators can be applied instead of primitive 
ones. In cognitive science, operator-composition is viewed as mechanism re- 
sponsible for acquisition of automatisms and the main explanation for speed- 
up effects of learning (Anderson et ah, 1989). 

In contrast, in AI planning, learning of domain specific control knowledge, 
that is, learning of problem solving strategies, are investigated as an additional 
mechanism, as discussed in section 2.5.2 in chapter 2. One possibility to 
model acquisition of control knowledge is learning of “cyclic” macros (Shell & 
Carbonell, 1989; Shavlik, 1990). Learning a problem solving strategy ideally 
eliminates search completely because the complete sub-goal structure of a 
problem domain is known. For example, a macro for a one-way transportation 
problem as rocket represents the strategy that all objects must be loaded 
before the rocket moves to its destination (see sect. 3. 1.4. 2 in chap. 2 and 
sect. 8.4 in chap. 8). There is empirical evidence, for example in the Tower 
of Hanoi domain, that people can acquire such kind of knowledge (Anzai & 
Simon, 1979). 

9.2.2 Three Levels of Learning 

We propose, that our system, as given in figure 1.2 provides a general frame- 
work for modeling the acquisition of strategies from problem solving experi- 
ence: Starting-point is a problem specification, given as primitive operators, 
their application conditions, and a problem solving goal. Using DPlan, the 
problem is explored, that is, operator sequences for transforming some ini- 
tial states into a state fulfilling the goals are generated. This experience is 
integrated into a finite program, corresponding roughly to a set of operator 
chunks, as discussed above using plan transformation. Subsequently, this ex- 
perience is generalized to a recursive program scheme (RPS) using a folding 
technique. That is, the system infers a domain specific control strategy which 
simultaneously represents the (goal) structure of the current domain. Alter- 
natively, - as we will discuss in part III - after initial exploration a similar 
problem might be retrieved from memory. That is, the system recognizes that 
the new problem can be solved with the same strategy as an already known 
problem. In this case, a further generalized scheme, representing the abstract 
strategy for solving both problems is learned. 

All three steps of learning are illustrated in figure 9.1 for the simple dear- 
block example which was discussed in chapters 3, 7, and 8: To reach the goal 
that block C has a clear top, all blocks lying above C have to be put on 
the table. This can be done by applying the operator puttable{x) if block 
X has a clear top. The finite program represents the experience with three 
initial states as a nested conditional expression. The most important aspect 
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of this finite program, which makes it different from the cognitive science 
approaches to operator chunking, is, that objects are not referred to directly 
by their name but with help of a selector function (topof). Selector functions 
are inferred from the structure of the universal plan (as described in chap. 8). 
This process in a way captures the evolution of perceptual chunks from prob- 
lem solving experience (Koedinger & Anderson, 1990): For the clearhlock 
example, the introduction of topof(x) represents the knowledge that the rel- 
evant part of the problem is the block lying on top of the currently focussed 
block. For more complex domains, as Tower of Hanoi, the data type repre- 
sents “partial solutions” (such as how many discs are already in the correct 
position or looking for the largest free disc). 



Generalization over problem states 



Finite Program 
Operator Chunk 



A 



C 




[H [H 0 



IF cleartop(x) 

THEN s 

ELSE IF cleartop(topof (x) ) 

THEN puttable (topof (x) , s ) 

ELSE IF cleartop (topof (topof (x) ) ) 

THEN puttable(topof (x) , 

puttable (topof (topof (x) ) , s)) 
ELSE undefined 



Generalization over recursive enumerable 
problem spaces 




— ►- Recursive Program Scheme 

Probiem Solving Strategy 

clearblock(x, s) = 

IF cleartop (x) 

THEN s 

ELSE puttable (topof (x) , 

clearblock (topof (x) ,s) ) 



Generalization over classes of problems 



Scheme Hierarchy 



'r(x, V) = 

IF bo(x) 

THEN V 

ELSE opl(op2(x), r(op2(x), v) ) 




clearblock(x, s) - addx(x, y) - 

IF clear (x) IF eqO(x) 

THEN s THEN y 

ELSE puttable (topof (x) , ELSE plus (pred (x) , addx (pred(x) ,y) ) 

clear block ( topof (x) , s) ) 



Fig. 9.1. Three Levels of Generalization 



In the second step, this primitive behavioral program is generalized over 
recursive enumerable problem spaces: a strategy for clearing a block in 
n-block problems is extrapolated and interpreted as a recursive program 
scheme. Induction of generalized structures from examples is a fundamen- 
tal characteristic of human intelligence as for example proposed by Chomsky 
as “language acquisition device” (Chomsky, 1959) or by Holland, Holyoak, 
Nisbett, and Thagard (1986). This ability to extract general rules from some 
initial experience is captured in the presented technique of folding of finite 
programs by detecting regularities. 

The representation of problem schemes by recursive program schemes dif- 
fer from the representation formats proposed in cognitive psychology (Rumel- 
hart & Norman, 1981). But RPSs are capturing exactly the characteristics 
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which are attributed to cognitive schemes, namely that schemes represent 
procedural knowledge (“knowledge how”) which the system can interrogate 
to produce “knowledge that”, that is, knowledge about the structure of a 
problem. Thereby problem schemes are suitable for modeling analogical rea- 
soning: The acquired scheme represents not only the solution strategy for 
unstacking towers of arbitrary height but for all structural identical prob- 
lems. Experience with a blocks-world problem can for example be used to 
solve a numerical problem which has the same structure by re-interpreting 
the meaning of the symbols of an RPS. After solving some problems with 
similar structures, more general schemes evolve and problem solving can be 
guided by abstract schemes. 



Part III 

Schema Abstraction 




10. Analogical Reasoning and Generalization 



”/ wish you’d solve the ease, Miss Marple, like you did the time Miss 
Wetherby’s gill of picked shrimps disappeared. And all because it reminded 
you of something quite different about a sack of coals. ” ” You ’re laughing, my 
dear, ” said Miss Marple, ’’but after all, that is a very sound way of arriving 
at the truth. It’s really what people call intuition and make such a fuss about. 
Intuition is like reading a word without having to spell it out. A child can’t 
do that because it has had so little experience. But a grown-up person knows 
the word because they’ve seen it often before. You catch my meaning. Vicar?” 
” Yes, ” I said slowly, ”I think I do. You mean that if a thing reminds you of 
something else - well, it’s probably the same kind of thing.” 

— Agatha Christie, The Murder at the Vicarage, 1930 



Analogical inference is a special case of inductive inference where knowledge 
is transferred from a known base domain to a new target domain. Analogy is a 
prominent research topic in cognitive science: In philosophy, it is discussed as 
source of creative thinking; in linguistics, similarity-creating metaphors are 
studied as a special case of analogical reasoning; in psychology, analogical 
reasoning and problem solving are researched in innumerous experiments 
and there exist several process models of analogical problem solving and 
learning; in artificial intelligence, besides some general approaches to analogy, 
programming by analogy and case-based reasoning are investigated. 

In the following (sect. 10.1), we will first give an overview of central con- 
cepts and mechanisms of the field. Afterwards (sect. 10.2), we will introduce 
analogical reasoning for domains with different levels of complexity - from 
proportional analogies to analogical problem solving and planning. Then we 
will discuss programming by analogy as a special case of problem solving 
by analogy (sect. 10.3). Finally (sect. 10.4), we will give some pointers to 
literature. 



10.1 Analogical and Case-Based Reasoning 

10.1.1 Characteristics of Analogy 

Typically for AI and psychological models of analogical reasoning is that 
domains or problems are considered as structured objects, such as terms or 
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relational structures (semantic nets). Structured objects are often represented 
as graphs with basic objects as nodes and relations between objects as arcs. 
Relations are often distinguished in object attributes (unary relations), rela- 
tions between objects (first order n-ary relations), and higher oder relations. 

Based on this representational assumption. Centner (1983) introduced 
a characterization of analogy and contrasted analogy with other modes of 
transfer between two domains (see tab. 10.1): In contrast to mere appearance 
and literal similarity, analogy is based on mapping the relational structure of 
domains rather than object attributes. For the famous Rutherford analogy 
(see fig. 10.4) - “The atom is like our solar system” - it is relevant that there 
is a central object (sun/nucleus) and objects revolving around this object 
(planets/electrons), but is irrelevant how much these objects weight or what 
their temperature is. The same is true for abstraction, but in contrast to 
analogy, the objects of the base domain (central force system) are generalized 
concepts rather than concrete instances. Metaphors can be found to be either 
a form of similarity-based transfer (comparable to mere appearance) or a form 
of analogy where a similarity is created between two previously unconnected 
domains (Indurkhya, 1992).^ In contrast to analogy, case-based reasoning 
often relies on a simple mapping of domain attributes (Kolodner, 1993). 



Table 10.1. Kinds of Predicates Mapped in Different Types of Domain Comparison 
(Centner, 1983, Tab. 1, extended) 

No. of No. of 

Attributes Relations 

mapped to mapped to 

target target Example 



Mere Appearance 


Many 


Few 


A sunflower is like the sun. 


Literal Similarity 


Many 


Many 


The K5 solar system is like our 
solar system. 


Analogy 


Few 


Many 


The atom is like our solar system. 


Abstraction 


Few 


Many 


The atom is a central force system. 


Metaphor 


Many 


Few 


Her smile is like the sun. 




Few 


Many 


King Lois XIV was like the sun. 



Below we will give a hierarchy of analogical reasoning from simple propo- 
sitional analogies where only one relation is mapped to complex analogies 
between (planning or programming) problems. In chapter 11 we will intro- 
duce a graph representation for water-jug problems in detail. 

There is often made a distinction between within- and between-domain 
analogies (Vosniadou & Ortony, 1989). For example, using the solution of one 
programming problem (factorial) to construct a solution to a new problem 
(sum) is considered as within-domain (Anderson & Thompson, 1989) while 

^ Famous are the metaphors of Philip Marlowe. Just to give you one: “The purring 
voice was now as false as an usherette’s eyelashes and as slippery as a watermelon 
seed.” Raymond Chandler, The Big Sleep, 1939. 
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transferring knowledge about the solar system to explain the structure of an 
atom is considered as between-domain. Because analogy is typically described 
as structure mapping (see below), we think that this classification is an ar- 
tificial one: When source and target domains are represented as relational 
structures and these structures are mapped by a syntactical pattern match- 
ing algorithm, the content of these structures is irrelevant. In case-based 
reasoning, base and target are usually from the same domain. 

10.1.2 Sub-processes of Analogical Reasoning 

Analogical reasoning is typically described by the following, possibly inter- 
acting, sub-processes: 

Retrieval: For a given new target domain/problem a “suitable”, “similar” 
base domain/problem is retrieved (from memory). 

Mapping: The base and target structures are mapped. 

Transfer: The target is enriched with information from the base. 
Generalization: A structure generalizing over base and target is induced. 

Typically, it is assumed that retrieval is based on superficial similarity, 
that is, a base problem is selected which shares a high number of attributes 
with the new problem. Superficial similarity and structural similarity must 
not necessarily correspond and it was shown in numerous studies that human 
problem solvers have difficulties in finding an adequate base problem (Novick, 
1988; Ross, 1989). 

Most cognitive science models of analogical reasoning focus on modeling 
the mapping process (Falkenhainer, Forbus, & Gentner, 1989; Hummel & 
Holyoak, 1997). The general assumption is that objects of the base domain 
are mapped to objects of the target domain in a structurally consistent way 
(see fig. 10.1). The approaches differ with respect to the constraints on map- 
ping, allowing only first order or also higher order mapping, and restricting 
mapping to isomorphism, homomorphism or weaker relations (Holland et ah, 
1986). 




Fig. 10.1. Mapping of Base and Target Domain 



Gentner (1983) proposed the following constraints for mapping a base 
domain to a target domain: 
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First Order Mapping: An object in the base can be mapped on a different 
object in the target, but base relations must be mapped on identical 
relations in the target. 

Isomorphism: A set of base objects must be mapped one-to-one and struc- 
turally consistent to a set of target objects. That is, if r(oi, 02 ) holds for 
the base then r(/(oi), /( 02 )) must hold for the target, where /(o) is the 
mapping function. 

Systematicity: Mapping of large parts of the relational structure is preferred 
over mapping of small isolated parts, that is, relations with greater arity 
are preferred. 

Relying on these constraints, mapping corresponds to the problem of finding 
the largest common sub-graph of two graphs (Schadler & Wysotzki, 1999). 
If the base is mapped to the target, the parts of the base graph which are 
connected with the common sub-graph are transferred to the target where 
base objects are translated to target objects in accordance with mapping. 
This kind of transfer is also called inference, because relations known in the 
base are assumed to also hold in the target. If the isomorphism constraint is 
relaxed, transfer additionally can involve modifications of the base structure. 
This kind of transfer is also called adaptation (see chap. 11). 

After successful analogical transfer, the common structure of base and 
target can be generalized to a more general scheme (Novick & Holyoak, 1991). 
For example, the structure of the solar system and the Rutherford atom model 
can be generalized to the more general concept of a central force system. 
To our knowledge, none of the process models in cognitive science models 
generalization learning (see chap. 12). 

In case-based reasoning, mostly only retrieval is addressed and the re- 
trieved case is presented to the system user as a source of information which 
he might transfer to a current problem. 



10.1.3 Transformational versus Derivational Analogy 

The subprocesses described above characterize so called transformational 
analogy. An alternative approach for analogical pro&Zem solving, called deriva- 
tional analogy, was proposed by Carbonell (1986). He argues that, from a 
computational point of view, transformational analogy is often inefficient 
and can result in suboptimal solutions. Instead of calculating a base/target 
mapping and solving the target by transfer of the base structure, it might 
be more efficient to reconstruct the solution process' that is, use a remem- 
bered problem solving episode as guideline for solving the new problem. 
A problem solving episode consists of the reasoning traces (derivations) of 
past solution processes, including the explored subgoal structure and used 
methods. Derivational analogy can be characterized by the following sub- 
processes: (1) Retrieving a suitable problem solving episode, (2) applying 
the retrieved derivation to the current situation by “replaying” the problem 
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solving episode, checking for each step if the derivation is still applicable in 
the new problem solving context. Derivational analogy is for example used 
within the AI planning system Prodigy (Veloso & Carbonell, 1993). An empir- 
ical comparison of transformational and derivational analogy was conducted 
by Schmid and Carbonell (1999). 

10.1.4 Quantitive and Qualitative Similarity 

As mentioned above, retrieval is typically based on similarity of attributes. 
For comparison of base and target, all kinds of similarity measures defined on 
attribute vectors can be used. An overview over measures of similarity and 
distance is typically given in textbooks on cluster analysis, such as Eckes and 
Rofibach (1980). In psychology, non-symmetrical measures are discussed, for 
example, the contrast-model of Tversky (1977). 

In analogy research, focus is on measures for structural similarity. A va- 
riety of measures are based on the size of the greatest common sub-graph of 
two structures. Such a measure for un-labeled graphs was for example pro- 
posed by Zelinka (1975): d{G, H) = \N\ — \Nu\, where | | is the number of 

nodes of the larger graph and | Nu \ is the number of nodes in the greatest 
common sub-graph of G and H . A measure considering the number of nodes 
and arcs in the graphs is introduced in chapter 11. 

Another approach to structural similarity is to consider the number of 
transformations which are necessary for making two graphs identical (Bunke 
& Messmer, 1994). A measure of transformation distance for trees was for 
example proposed by Lu (1979): d(Ti,T 2 ) = Ws ■ s + Wd ■ d + Wi ■ i, where 
s represents the number of substitutions (renamings of node-labels), d the 
number of deletions of nodes, and i the number of insertion of nodes. Param- 
eters w give operation specific weights. To guarantee that the measure is a 
metric, it must hold that Wg < Wd,Wi and Wd = wt (proof in Mercy, 1998). 
Transformational similarity is also the basis for structural information theory 
(Leeuwenberg, 1971). 

All measures discussed so far are quantitative, that is, a numerical value is 
calculated which represents the similarity between base and target. Usually, 
a threshold is defined and a domain or problem is considered as a candidate 
for analogical reasoning, if its similarity to the target problem lies above this 
threshold. A qualitative approach to structural similarity was proposed by 
Plaza (1995). Plaza’s domain of application is the retrieval of typed feature 
terms (e. g., records of employees), which can be hierarchically organized, for 
example the field “name” consists of a further feature term with the fields 
“first name” and “surname”. For a given (target) feature term, the most 
similar (base) feature term from a data base is identified by the following 
method: Each pair of the fixed target and a term from the data base is 
anti-unified. The resulting anti-instances are ordered with respect to their 
subsumption relation and the most specific anti-instance is returned. Anti- 
unification was introduced in section 6.3.3 in chapter 6 under the name of 
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least general generalization. An anti-unification approach for programming 
by analogy is presented in chapter 12. 



10.2 Mapping Simple Relations or Complex Structnres 

10.2.1 Proportional Analogies 

The most simple form of analogical reasoning are proportional analogies of 
the form 

A:B :: C:? “A is to B as C to ?”. 

Expression A to i? is the base domain and expression C to ? is the target do- 
main. The relation existing between A and B must be identified and applied 
to the target domain. Such problems are typically used in intelligence tests, 
and algorithms for solving proportional analogies were introduced early in AI 
research (Evans, 1968; Winston, 1980). In intelligence tests, items are often 
semantic categories as “Rose is to Flower as Herring to ?”. In this exam- 
ple, the relevant relation is subordinate/superordinate and the superordinate 
concept to “Herring” must be retrieved (from semantic memory). Because 
the solution of such analogies is knowledge dependent, often letter strings 
(Hofstadter & The Fluid Analogies Research Group, 1995; Burns, 1996) or 
simple geometric figures (Evans, 1968; Winston, 1980; O’Hara, 1992) are used 
as alternative. 

An example for an analogy which can be solved with Evans Analogy- 
program (Evans, 1968) is given in figure 10.2. In a first step, each figure is 
decomposed into simple geometric objects (such as a rectangle and a tri- 
angle). Because decomposition is ambiguous for overlapping objects, Evans 
used Gestalt-laws and context-information for decomposition. For the pairs 
of figures (A,R), (A, C), (B,C), (C, 1), . . ., (G, 5) relations between objects 
and between spatial positions are constructed. These relations are used for 
constructing transformation rules A — > R. Transformation includes object 
mapping, deletion and insertion. The rules are abstracted such that they can 
be applied to figure C, resulting in a new figure which hopefully corresponds 
to one of the given alternatives for D. 

An algorithm which can use geometric analogy problems using context- 
dependent re-descriptions was proposed by O’Hara (1992). An example is 
given in figure 10.3. All figures must be composed of lines. An algebra is 
used to represent/construct figures. Operations are translation, rotation, re- 
flection, scaling of objects and glueing of pairs of objects. 

In a first step, an initial representation of figure A is constructed. Then, 
a representation of B and a transformation t are constructed such that B = 
t(A). A representation of C and an isomorphic mapping /r are constructed 
such that C = fJ-{A). Mapping fj, is extended to /r', preserving isomorphism. 
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A B C D 




1 2 3 4 5 



Fig. 10.2. Example for a Geometric- Analogy Problem (Evans, 1968, p. 333) 




Fig. 10.3. Context Dependent Descriptions in Proportional Analogy (O’Hara, 
1992) 



and D = /r'(B) is constructed. If no mapping C = ^J.{A) can be found, 
the algorithm backtracks and constructs a different representation for B (re- 
description) . 

Letter string domains are easier to model as geometric domains and differ- 
ent approaches to algebraic representation (Leeuwenberg, 1971; Indurkhya, 
1992) have been proposed. The Copycat program of Hofstadter and The Fluid 
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Analogies Research Group (1995) solves letter string analogies of the form 
abc : abd :: kji :?. 

10.2.2 Causal Analogies 

In proportional analogies, a domain is represented by two structured objects 
A and B with a single or a small set of relations t{A) = B which are relevant 
for analogical transfer. More complex domains are explanatory structures, for 
example for physical phenomena. In Rutherford analogy, introduced above, 
knowledge about the solar system is used to infer the cause why electrons are 
rotating around the nucleus of an atom (see fig. 10.4). 

Systems realizing mapping and inference for such domains are for exam- 
ple the structure mapping engine {SME, Falkenhainer et ah, 1989) or LISA 
(Hummel & Holyoak, 1997). The SME is based on the constraints for struc- 
ture mapping proposed by Centner (1983), which were introduced above. 
Note, that also the inferred higher-order relation is labeled “cause” , mapping 
and inference are purely syntactical, as described for proportional analogies. 



(a) (b) 

Planet-j 




10.2.3 Problem Solving and Planning by Analogy 

Many people have pointed out that analogy is an important mechanism in 
problem solving. For example, Polya (1957) presented numerous examples 
for how analogical reasoning can help students to learn proving mathematical 
theorems. Psychological studies on analogical problem solving address mostly 
solving of mathematical problems (Novick, 1988; Reed, Ackinclose, & Voss, 
1990) or program construction (Anderson & Thompson, 1989; Weber, 1996). 
Typically, students are presented with worked out examples which they can 
use to solve a new (target) problem. The standard example for programming 
is using factorial as base problem to construct sum: 

fac(x) = if(eq0(x), 1, *(x, fac(pred(x)))) 
sum(x) = if(eq0(x), 0, +(x, sum(pred(x)))). 
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Anderson and Thompson (1989) propose an analogy model where pro- 
gramming problems and solutions are represented as schemes with function 
slots representing the intended operation of the program and form slots rep- 
resenting the syntactical realization. For a target problem only the function 
slot is filled and the form slot is inferred from the given base problem. 

Examples for simple word algebra problems used in experiments by Reed 
et al. (1990) is given in table 10.2. Analogical problem solving means to 
identify the relations between concepts in a given problem with the equation 
for calculating the solution. For a new (target) problem, the concepts in the 
new text have to be mapped with the concepts in the base problem and then 
the equation can be transferred. Reed et al. (1990) could show that problem 
solving success is higher if a base problem which includes the target problem 
is used (e. g., the third problem in tab. 10.2 as base and the second as target) 
but that most subjects did not select the inclusive problems as most helpful 
if they could choose themselves. In chapter 11 we report experiments with 
different variants of inclusive problems in the water jug domain. 



Table 10.2. Word Algebra Problems (Reed et al., 1990) 



A group of people paid $238 to purchase tickets to a play. How many people were 
in the group if the tickets cost $14 each? 

$14 = $238/n 

A group of people paid $306 to purchase theater tickets. When 7 more people joined 
the group, the total cost was $425. How many people were in the original group if 
all tickets had the same price? 

$306/n = $425/(n-h7) 

A group of people paid $70 to watch a basketball game. When 8 more people joined 
the group the total cost was $ 20. How many people were in the original group if 
the larger group received a 20% discount? 

0.8 • ($70/n) = $120/(n-b8) 



An Al system which addresses problem solving by analogy is Prodigy 
(Veloso & Carbonell, 1993). Here, a derivational analogy mechanism is used 
(see above). Prodigy- Analogy was, for example, applied for the rocket domain 
(see sect. 3. 1.4. 2 in chap. 3): A planning episode for solving a rocket problem 
is stored (e. g., transporting two objects) and indexed with the initial and 
goal states. For a new problem (e. g., transporting four objects) the old 
episode can be retrieved by matching the current initial and goal states with 
the indices of stored episodes and the retrieved episode is replayed. While 
Veloso and Carbonell (1993) could demonstrate efficiency gains of reuse over 
planning from the scratch for some example domains, Nebel and Koehler 
(1995) provided a theoretical analysis that in the worst case reuse is not more 
efficient than planning from the scratch. The first bottleneck is retrieval of a 
suitable case and the second is modification of old solutions. 
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10.3 Programming by Analogy 

Program construction can be seen as a special case of problem solving. In part 
II we have discussed automatic program synthesis as an area of research which 
tries to come up with mechanisms to automatize or support - at least routine 
parts of - program development. Programming by analogy is a further such 
mechanism, which is discussed since the beginning of automatic programming 
research. For example, Manna and Waldinger (1975) claim that retention 
of previously constructed programs is a powerful way to acquire and store 
knowledge. 

Dershowitz (1986) presented an approach to construct imperative pro- 
grams by analogical reasoning. A base problem - e. g., calculating the divi- 
sion of two real numbers with some tolerance - is given by its specification 
and solution. The solution is annotated with additional statements (such as 
assert, achieve, purpose) which makes it possible to relate specification and 
program. For a new problem - e. g., calculating the cube-root of a number 
with some tolerance - only the specification is given. By mapping the speci- 
fications (see fig. 10.5), the statements in the base program are transformed 
into statements of the to be constructed target program. Some additional 
inference methods are used to generate a program which is correct with re- 
spect to the specification from the initial program which was constructed by 
analogy. 

Real-Division: Cube-Root: 

ASSERT 0 <= c < d, e > 0 ASSERT a >= 0, e > 0 ACHIEVE 

ACHIEVE I c/d- g| < e |ad/3) - r| < e 

VARYING q VARYING r 

Mapping: q ^ r, c/d 

Transformations: 

q^r, 

u/v {replace each division operator hy a cube-root operator), 

c ^ a 

Fig. 10.5. Base and Target Specification (Dershowitz, 1986) 



Both programs are based on the more general strategy of binary search. 
As a last step of programming by analogy, Dershowitz proposes to construct 
a scheme by abstracting over the programs. For the example above, the oper- 
ators for division and cube-root are generalized to ■j{u,v), that is, a second- 
order variable is introduced. New binary search problems can then be solved 
by instantiating the scheme. If program schemes are acquired, deductive ap- 
proaches to program synthesis can be applied, for example, the stepwise re- 
finement approach used in the KIDS system (see sect. 6. 2. 2. 3 in chap. 6). 

Crucial for analogical programming is to detect relations between pairs 
of programs. The theoretical foundation for constructing mappings are pro- 
gram morphisms (Burton, 1992; Smith, 1993). A technique to perform such 
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mappings is anti-unification. A completely implemented system for program- 
ming by analogy, based on second-order anti-unification and generalization 
morphisms was presented by Hasker (1995). For a given specification and 
program, the system user provides a program derivation (c. f., derivational 
analogy), that is, steps for transforming the specification into the program. 
Second-order anti-unification is used to detect analogies between pairs of 
specifications which are represented as combinator terms. First, he introduces 
anti-unification for monadic combinator terms, that is, allowing only unary 
terms. Then, he introduces anti-unification for product types. Monadic com- 
binator terms are not expressive enough to represent programs while combi- 
nator terms for cartesian product types - allowing that sub-terms are deleted 
- are too general and allow infinitely many minimal generalizations. There- 
fore, Hasker introduces relevant combinator terms as a subset of combinators 
for cartesian product types which allow introducing pairs of terms but do 
not allow to ignore sub-terms. For this class of combinator terms there still 
exist, possibly large, sets of minimal generalization. Therefore, Hasker intro- 
duces some heuristics and allows for user interaction to guide construction of 
a “useful” generalization. 

Our own approach to programming with analogy is still at its beginning. 
We consider how folding of finite program terms into recursive programs (see 
chap. 7) can be alternatively realized by analogy or abstraction. That is, 
mapping is performed on pairs of finite programs, where for one program 
the recursive generalization is known and for the other not (see chap. 12). In 
contrast to most approaches to analogical reasoning, retrieval is not based on 
attribute similarity but on structural similarity. We use anti-unification for 
retrieval (Plaza, 1995) as well as for mapping and transfer (Hasker, 1995). 
Calculating the anti-instance of two terms results in their maximally specific 
generalization. Our current approach to anti-unification is much more re- 
stricted than the approach of Hasker (1995). Our restricted approach guaran- 
tees that the minimal generalization of two terms is unique. But for retrieval, 
based on identifying the maximal specific anti-instance in a subsumption hi- 
erarchy, as proposed by Plaza (1995, see above), typically sets of candidates 
for analogical transfer are returned. Here, we must provide additional infor- 
mation to select a useful base. One possibility is, to consider the size and 
type of structural overlap between terms. We conducted psychological exper- 
iments to identify some general criteria of structural base/target relations 
which allow successful transfer for human problem solvers (see chap. 11). If 
a finite program (associated with its recursive generalization) is selected as 
base, the term substitutions calculated while constructing the anti-instance 
of this program with the target can be applied to the recursive base pro- 
gram and thereby the recursive generalization for the target is obtained (see 
chap. 12). 

We believe, that human problem solvers prefer abstract schemes over con- 
crete base problems to guide solving novel problems and use concrete prob- 
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lems as guidance only, if they are inexperienced in a domain. Therefore, in 
our work we focus on abstraction rather than analogy. Our approach to anti- 
unification works likewise for concrete program terms and terms containing 
object and function variables, that is schemes. Anti-unifying a new finite pro- 
gram term with the unfolding of some program scheme results in a further 
generalization. Consequently, a hierarchy of abstract schemes develops over 
problem solving experience. 

Our notion of representing programs as elements of some term algebra in- 
stead of some given programming language (see chap. 7) allows us to address 
analogy and abstraction in the domain of programming as well as in more 
general domains of problem solving (such us solving blocks-world problems 
or other domains considered in planning) within the same approach. That 
is, as discussed in chapter 9, we address the acquisition of problem schemes 
which represent problem solving strategies (or recursive control rules) from 
experience. 



10.4 Pointers to Literature 

Analogy and case-based reasoning is an extensively researched domain. A bib- 
liography for both areas can be found at http://www.ai-cbr.org. Classical 
books on case-based reasoning are Riesbeck and Schank (1989) and Kolodner 
(1993). A good source for current research are the proceedings of the inter- 
national conference on case-based reasoning (ICCBR). Cognitive models of 
analogy are presented, for example, at the annual conference of the Cognitive 
Science Society (CogSci) and in the journal Cognitive Science. A discussion 
of metaphors and analogy is presented by Indurkhya (1992). Holland et al. 
(1986) discuss induction and analogy from perspectives of philosophy, psy- 
chology, and AL A collection of cognitive science papers on similarity and 
analogy was presented by Vosniadou and (Eds.) (1989). Some Al papers on 
analogy can be found in Michalski et al. (1986). 
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’’And what is science about?” ”He explained that scientists formulate theories 
about how the physical world works, andthen test them out by experiments. 
As long as the experiments succeed, then the theories hold. If they fail, the 
scientists have to find another theory to explain the facts. He says that, with 
science, there’s this exciting paradox, that diosillusionment needn’t be defeat. 
It ’s a step forward. ” 

— P. D. James, Death of an Expert Witness, 1977 



In this chapter, we present two psychological experiments to demonstrate that 
human problem solver do and can use not only base (source) domains which 
are isomorphic to target domains but also rely on non-isomorphic structures 
in analogical problem solving. Of course, not every non-isomorphic relation 
is suitable for analogical transfer. Therefore, we identified which degree of 
structural overlap must exist between two problems to guarantee a high prob- 
ability of transfer success. This research is not only of interest in the context 
of theories of human problem solving. In the context of our system IPAL, 
criteria are needed for deciding whether it is worthwhile to try to generate 
a new recursive program by analogical transfer of a program (or by instan- 
tiating a program scheme) given in memory or to generate a new solution 
from scratch, using inductive program synthesis. In the following, we first 
(sect. 11.1) give an introduction in psychological theories to analogical prob- 
lem solving and present the problem domain used in the experiments. Then 
we report two experiments on analogical transfer of non-isomorphical source 
problems (sect. 11.2 and sect. 11.3). We conclude with a discussion of the 
empirical results and possible areas of application (sect. 11.4).^ 



11.1 Analogical Problem Solving 

Analogical reasoning is an often used strategy in everyday and academic 
problem solving. For example, if a person already has experience in planning 
a trip by train, he/she might transfer this knowledge to planning a trip by 

^ This chapter is based on the papers Schmid, Wirth, and Polkehn (2001) and 
Schmid, Wirth, and Polkehn (1999). 

U. Schmid: Inductive Synthesis of Functional Programs, LNAI 2654, pp. 291-310, 2003. 
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plane. If a student already has knowledge in solving an equation with one 
additive variable, he/she might transfer the solution procedure to an equation 
with a multiplicative variable. For analogical transfer, a previously solved 
problem - called source - has to be similar to the current problem - called 
target. While a large number of common attributes might help to find an 
analogy, source and target have to be structurally similar for transfer success 
(Holyoak & Koh, 1987). In the ideal case, source and target are structurally 
identical (isomorph) - but this is seldom true in real-live problem solving. 

Analogical problem solving is commonly described by the following (possi- 
bly interacting) component processes (e. g., Keane, Ledgeway, & Duff, 1994): 
representation of the target problem, retrieval of a previously solved source 
problem from memory, mapping of the structures of source and target, trans- 
fer of the source solution to the target problem, and generalizing over the com- 
mon structure of source and target. The empirically best explored processes 
are retrieval and mapping (see Hummel & Holyoak, 1997, for an overview). 
Retrieval of a source is assumed to be guided by overall semantic similarity 
(i. e., common attributes), often characterized as “superficial” in contrast to 
structural similarity (Centner & Landers, 1985; Holyoak & Koh, 1987; Ross, 
1989). Empirical results show that retrieval is the bottleneck in analogical 
reasoning and often can only be performed successfully if explicit hints about 
a suitable source are given (Gick & Holyoak, 1980; Centner, Ratterman, & 
Forbus, 1993). Therefore, a usual procedure for studying mapping and trans- 
fer is to circumvent retrieval by explicitly presenting a problem as a helpful 
example (Novick & Holyoak, 1991). In the following, we will give a closer 
look at mapping and transfer. 



11.1.1 Mapping and Transfer 

Mapping is considered the core process in analogical reasoning. The deci- 
sion whether two problems are analogous is based on identifying structural 
correspondences between them. Mapping is a necessary but not always suf- 
ficient condition for successful transfer (Novick & Holyoak, 1991). There are 
numerous empirical studies concerning the mapping process (c. f., Hummel 
& Holyoak, 1997) and all computational models of analogical reasoning pro- 
vide an implementation of this component (Falkenhainer et ah, 1989; Keane 
et ah, 1994; Hummel & Holyoak, 1997). Mapping is typically modelled as 
first identifying the corresponding components of source and target and then 
carrying over the conceptual structure from the source to the target. For ex- 
ample, in the Rutherford analogy (the atom is like the solar system), planets 
can be mapped to electrons and the sun to the nucleus of an atom together 
with relations as “revolves around” or “more mass than” (Centner, 1983). 
In the structure mapping theory (Centner, 1983) it is postulated that map- 
ping is performed purely syntactically and that it is guided by the princi- 
ple of systematicity - preferring mapping of greater portions of structure to 
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mapping of isolated elements. Alternatively, Holyoak and colleagues postu- 
late that mapping is constrained by semantic and pragmatic aspects of the 
problem (Holyoak & Thagard, 1989; Hummel & Holyoak, 1997). Mapping 
might be further constrained such that it results in easy adaptability (Keane, 
1996). Currently, it is discussed that a target might be re-represented, if 
source/target mapping cannot be performed successfully (Hofstadter & The 
Fluid Analogies Research Group, 1995; Centner, Brem, Ferguson, Markman, 
Levidow, Wolff, & Fobus, 1997). 

Based on the mapping of source and target, the conceptual structure of 
the source can be transferred to the target. For example, the explanatory 
structure that the planets revolve arround the sun because the sun attracts 
the planets might be transferred to the domain of atoms. Transfer can be 
faulty or incomplete, even if mapping was performed successfully (Novick 
& Holyoak, 1991). Negative transfer can also result from a failure in prior 
sub-processes - construction of an unsuitable representation of the target, 
retrieval of an inappropriate source problem, or incomplete, inconsistent or 
inappropriate mapping of source and target (Novick, 1988). Analogical trans- 
fer might lead to the induction of a more general schema which represents an 
abstraction over the common structure of source and target (Gick & Holyoak, 
1983). For example, when solving the Rutherford analogy, the more general 
concept of central force systems might be learned. 

11.1.2 Transfer of Non-isomorphic Source Problems 

Our work focusses on analogical transfer in problem solving. There is a 
marginal and a crucial difference between general models of analogical reason- 
ing and models of analogical problem solving. While in general source and 
target might be from different domains (between-domain analogies as the 
Rutherford analogy), in analogical problem solving source and target typi- 
cally are from the same domain (within-domain analogies, e. g., Vosniadou 
& Ortony, 1989). For example, people can use a previously solved algebra 
word problem as an example to facilitate solving a new algebra word prob- 
lem (Novick & Holyoak, 1991; Reed et ah, 1990), or they can use a computer 
program with which they are already familiar as an example to construct 
a new program (Anderson & Thompson, 1989). While the discrimination of 
between- and within-domain analogies is relevant for the question of how a 
suitable source can be retrieved, it has no impact on structure mapping if 
this process is assumed to be performed purely syntactically. 

The more crucial difference between models of analogical reasoning and 
of problem solving is that in analogical reasoning transfer is mostly described 
by inference (in so-called explanatory analogies, e. g.. Centner, 1983) vs. by 
adaptation (in problem solving, e. g., Keane, 1996). In the first case, (higher- 
order) relations given for the source are carried over to the target - as the 
explanation given above of why electrons revolve around the nucleus. In ana- 
logical problem solving, on the other hand, most often the complete solution 
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procedure of the source problem is adapted to the target. Analogical transfer 
of a problem solution subsumes the structural “carry-over” along with pos- 
sible changes (adaptation) of the solution structure and the application of 
the solution procedure. For example, if we are presented with the necessary 
operations to isolate a variable in an equation, we can solve a new equation 
by adapting the known solution procedure. If structures of source and target 
are identical (isomorphic), transfer can be described as simply replacing the 
source concepts by the target concepts in the source solution. For a source 
equation 2 • x -I- 5 = 9 with solution x = the target 3 • x -I- 4 = 16 can be 

solved by (1) mapping the numbers of source and target, that is 2 is mapped 
to 3, 4 to 5 and 9 to 16 and by (2) substituting the corresponding numbers 
in the source solution. An example for source inclusive source/target pair 
mapping is given below in figure 11.3. 

We are especially interested in conditions for successful transfer of non- 
isomorphic source solutions. There are a variety of non-isomorphical source/ 
target relations discussed in literature: First, there are different types of map- 
ping relations: one-to-one-mappings (isomorphism), many-to-one, and one- 
to-many mappings (Spellman & Holyoak, 1996). Secondly, there are different 
types and degrees of structural overlap (see fig. 11.1): a source might be “com- 
pletely contained” in the target (source inclusiveness; Reed et ah, 1990), or 
a source might represent all concepts needed for solving the target together 
with some additional concepts (target exhaustiveness; Centner, 1980). These 
are two special cases of structural overlap between source and target. It seems 
plausible to assume that if the overlap is too small, a problem is no longer 
helpful for solving the target. Such a problem would not be characterized 
as a source problem. While there are some empirical studies investigating 
transfer of non-isomorphic sources (Reed et ah, 1990; Novick & Hmelo, 1994; 
Gholson, Smither, Buhrman, Duncan, & Pierce, 1996; Spellman & Holyoak, 
1996), there is no systematic investigation of the structural relation between 
source and target which is necessary for succesful transfer. Our experimental 
work focusses on the impact of different types and degrees of structural over- 
lap on transfer success, that is, we currently are only considering one-to-one 
mappings. 

11.1.3 Structural Representation of Problems 

To determine the structural relation between source and target we have to 
rely on explicitly defined representations of problem structures. In cognitive 
models of analogical reasoning, problems are typically represented by schemas 
(SME, Falkenhainer et ah, 1989), (lAM, Keane et ah, 1994) or by semantic 
nets (ACME, Holyoak & Thagard, 1989), (LISA, Hummel & Holyoak, 1997). 
From a more abstract view, these representations correspond to graphs, where 
concepts are represented as nodes and relations between them as arcs. Ex- 
amples for graphs are given in figure 11.1. For actual problems, nodes (and 
possibly arcs) are labelled. A graph representation of the solar system con- 
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Isomorphism 




Fig. 11.1. Types and degrees of strnctural overlap between source and target 
Problems 



tains for instance a node labelled with the relation more mass than connected 
to a node planet- 1 and to a node sun. 

While explicit representations are often presented for explanatory analo- 
gies (Gick & Holyoak, 1980; Centner, 1983), this is not true for problem 
solving. For algebra problems (Novick & Holyoak, 1991; Reed et ah, 1990), 
the mathematical equations can be used to represent the problem structure 
(see fig. 11.3). In general - when investigating such problems as the Tower of 
Hanoi (Clement & Richard, 1997; Simon & Hayes, 1976) - both the structure 
of a problem and the problem solving operators, possibly together with ap- 
plication conditions and constraints (Cholson et ah, 1996), have to be taken 
into account. 

In the classical transformational view of analogical problem solving (Cen- 
tner, 1983), little work has been done which addresses how to model analog- 
ical transfer of problems involving several solutions steps. In artifical intel- 
ligence, Carbonell (1986) proposed derivational analogy for multi-step prob- 
lems: He models problem solving by analogy as deriving a solution by replay 
of an already known solution process, checking on the way whether the condi- 
tions for operator applications of the source still hold when solving the target. 
In the following, we nevertheless adopt the transformational approach - de- 
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scribing analogical problem solving by mapping and transfer. That is, we will 
assume that both the (declarative) description of the problem and procedural 
information are structurally represented and that a target problem is solved 
by adapting the source solution to the target problem based on structure 
mapping. We assume that problems and solutions are represented in schemas 
capturing declarative as well as procedural aspects as argued for example by 
Anderson and Thompson (1989) and Rumelhart and Norman (1981). 

When specifying the representation of a problem, we have to decide on 
its format as well as its content (Gick & Holyoak, 1980). In general, it is not 
possible to determine all possible aspects associated with a problem, that 
is, we cannot claim complete representations. We adopt the position of Gick 
and Holyoak, to model at least all aspects which are relevant for successful 
transfer. A component of the (declarative) description of a problem is rele- 
vant, if it is necessary for generating the operation sequence which solves the 
problem. Furthermore, only the operation sequence which solves the problem 
is regarded as relevant procedural information. A successful problem solver 
has to focus on these relevant aspects of a problem and should ignore all 
other aspects. Of course, we do not assume that human problem solvers in 
general represent only relevant or all relevant aspects of a problem. Our goal 
is to systematically control variants of structural source/target relations and 
their impact on transfer, that is, our representational assumptions are not 
empirical claims but a means for task analysis. We want to construct “nor- 
matively complete” graph representations of problems to explore the impact 
of different analytically given structural source/target relations on empiri- 
cally observable transfer success. 

In the following, we will first introduce our problem solving domain - 
water redistribution tasks - and our problem representations. Then we will 
present two experiments. In the first experiment we will show that problem 
solvers can transfer a source solution with moderate structural similarity to 
the target if the problems do not vary in superficial features. In the second 
experiment we investigate a variety of different structural overlaps between 
source and target. 

11.1.4 Non- isomorphic Variants in a Water Redistribution 
Domain 

Because we focus on transfer of declarative and procedural aspects of prob- 
lems we constructed a problem type that can be classified as interpolation 
problems like the Tower of Hanoi problems (Simon & Hayes, 1976), the wa- 
ter jug problems (Atwood & Poison, 1976), or missionary-cannibal problems 
(Reed, Ernst, & Banerji, 1974; Gholson et ah, 1996). Problem solving means 
to find the correct multi-step sequence of operators that transform an ini- 
tial state into the goal state. Interpolation problems have well-defined initial 
and goal states and usually one well-defined multi-step solution. Thus, they 
are as suitable for systematically analyzing their structure as, for instance. 
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mathematical problems (Reed et ah, 1990) with the advantage that they are 
not likely to activate school-trained mathematical pre-knowledge. 

We constructed a water redistribution domain that is similar to but more 
complex than the water jug problems described by Atwood and Poison (1976) . 
In the initial state three (or four) jugs of different capacity are given. The 
jugs are initially filled with different amounts of water (initial quantities). 
The water has to be redistributed between the jugs in such a way that the 
pre-specified goal quantities are obtained. For example, given are the three 
jugs A, B and C with capacities ca = 36, cb = 45, and cc = 54 (units). In 
the initial state quantities are qA = 16, qs = 27, and qc = 34. To reach the 
goal state the values of these quantities must be transformed into qA = 25, 
qs = Q and qc = 52 by redistributing the water among the different jugs (see 
fig. 11.2). 



25 0 52 




goal 

quantiy 



initial 

quantiy 

capacity 



A 



B 



C 



name/position 



Solution: pour (C ,B) , pour(B,A), pour(A,C), pour(B,A) 



Fig. 11.2. A water redistribution problem 



The task is to determine the shortest sequence of operators that trans- 
form the initial quantities into the goal quantities. The only legal operator 
available is a pour-operator (the redistribute operator) that is restricted by 
the following conditions: (1) The only water to pour is the water contained 
by the jugs in the initial state. (2) Water can be poured only from a non- 
empty ‘pour out’-jug into an incompletely filled ‘pour in’- jug. (3) Pouring 
always results in either filling the ‘pour in’-jug up to its capacity with possibly 
leaving a rest of water in the ‘pour out’-jug or emptying the ‘pour out’-jug 
with possibly remaining free capacity in the ‘pour in’-jug. (4) The amount 
of water that is poured out of the ‘pour out’-jug is always the same amount 
that is filled in the ‘pour in’-jug. Formally this pour-operator is defined in 
the following way: 

IF not(gx(t) = 0) AND not(gv(t) = cy) THEN pour(X,Y) resulting in: 

IF qx (t) < Cy - qr (t) 
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THEN Qy {t + 1) := qy (t) + qx (t) 

qx {t + 1) ;= qx{t) - qx{t) (i. e. 0, emptying jug X) 

ELSE qx{t + 1) ~ qx(t) — (cy — gy(t)) 

qrit + 1 — gy(t) + (cy - <7y(t)) (i. e., cy, filling jug Y) 

with 

qx{t)'. quantity of jug X at solution step t 

cx- capacity of jug X, 

{cx — qx{t))- remaining free capacity of jug X. 

Because we are interested in which types and degrees of structural overlap 
are sufficient for successful analogical transfer, we have to ensure that subjects 
really refer to the source for solving the target problem. That is, the problems 
should be complex enough to ensure that the correct solution can not be 
found by trial and error, and difficult enough to ensure that the abstract 
solution principle is not immediately inferable. Therefore, we constructed 
redistribution problems for which exists only a single (for two problems two) 
shortest operator sequence (in problem spaces with over 1000 states and more 
than 50 cycle-free solution paths). 

To construct a structural representation of a problem we were guided by 
the following principles: (a) the goal quantity of each jug can be described 
as the initial quantity transformed by a certain (shortest) sequence of oper- 
ators; (b) relevant declarative attributes (capacities, quantities and relations 
between these attributes) of the initial and the goal state determine the solu- 
tion sequence of operators; and (c) each solution step can be described by the 
definition of the pour-operator given above. In terms of these principles oper- 
ator applications can be re-formulated by equations where a current quantity 
can be expressed by adding or subtracting amounts of water. For example, 
using the parameters introduced above, the first operator pour(C,B) of the 
example presented in figure 11.2 transforms the quantities of jug B and C 
of the initial state t = 0 into quantities of state t = 1 in the following way: 
(7 b( 0) = 27, gc(0) = 34, CB = 45, cc = 54: 

BECAUSE not(34 = 0) AND not(27 = 45) 

pour(C,B) at t = 0 results in: 

BECAUSE not (34 < (45 - 27)): 
qs(l) = 27-1 (45 - 27) = 45 
qc{l) = 34 - (45 - 27) = 16. 

Redistribution problems define a specific goal quantity for each jug. For 
this reason, there have to be as many equations constructed as there are jugs 
involved. This group of equations represents all relevant procedural aspects 
of the problem, that is, it represents the sequence of operators leading to the 
goal state. For example, the three equations of the three jugs in figure 11.2 
are presented in table 11.1. 
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Table 11.1. Relevant information for solving the source problem 
(a) Procednral 

pour(C,B) pour(B,A) pour(A,C) pour(B,A) 

(7a(4) = gA(0) +{CA - gA(0)) -CA +[cs - (CA ~ gA(0))] 

<?s(4) = ?s(0) +(cs-gs(0)) -(ca-(?a(0)) -[cs - (ca - i?a(0))] 

gc(4) = gc(0) —{cb — ga(0)) +ca 



(b) Declarative (constraints) 
Cb > (cA - <?a(0)) 

CA = 2 • (cs — i?s(0)) 

<?c(0) > (cs-gB(O)) 

CC < gc (0) + CA 



Each goal state is expressed by the values given in the initial state. On 
the right side of the equality sign these given values are combined in a way 
that transforms the initial quantity of each jug into its goal quantity. Cer- 
tain constraints between the initial values have to be satisfied so that these 
equations are balanced. These constraints can be analytically derived from 
the equations given in table 11.1 and they constitute the relevant declarative 
attributes of the problem. The constraints for water redistribution problems 
have the same function as the problem solving constraints given for exam- 
ple for the radiation or fortress problems (Holyoak & Koh, 1987) or the 
missionary-cannibal problems (Reed et ah, 1974; Gholson et ah, 1996). 

As an example of how these contraints can be derived from the equations, 
you can easily see that the last pour operator of the solution (pouring a 
certain amount of water from B into A) is only executable if the relation 
Cb > (cA — <Za(0)) holds. As a second constraint, the goal quantity of jug C 
could be described by the following equation <7c(4) = 9c(0) + {cb — <Zs(0)). 
But the expression (cb — <?s(0)) does not represent the quantity in jug B. It 
represents the remaining free capacity of this jug which, of course, can not 
be poured into another jug. Because the relation ca = 2 • {cb — 9s(0)) holds, 
we conclude that the value of the remaining free capacity of jug B has to be 
subtracted from the quantity of jug C (only possible if <7c(0) > (cs — <Zs(0)) 
holds) and that the double of this value (ca) has to be added to the quantity 
in jug C by pouring the capacity of jug A into jug C. Additionally, the relation 
cc < 9c(0) + CA determines the relative order of the two pour-operators: you 
have to subtract (c_b — <Zs(0)) before you can add ca to gc(0). 

The equations describing the transformations for each jug and the (in- 
)equations describing the constraints of the problem are sufficient to rep- 
resent all relevant declarative and procedural information of the problem. 
Thus, transforming all of them into one graphical representation leads to a 
normatively complete representation of the problem which can be used for a 
task analytical determination of the overlap between two problem structures. 
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The equations and in-equations for all water redistribution problems used in 
our experiments are given in appendix CC.7. 



11.1.5 Measurement of Structural Overlap 

Structural similarity between two graphs G and H is usually calculated as the 
size of the greatest common subgraph of G and H in relation to the size of the 
greater of both graphs (Schadler & Wysotzki, 1999; Bunke & Messmer, 1994). 
To calculate graph distance by formula 1 we introduce directed “empty” 
arcs between all pairs of nodes where no arcs exist. The size of the common 
subgraph is expressed by the sum of common arcs Vqh and nodes Nqh- 



d(G,H) — 1 — 



Vgh + Ngh 

max{VG, Vh) + max{No, Nh) 



( 11 . 1 ) 



The graph distance can assume values between 0 and 1, indicating iso- 
morphic relations between two graphs with d(G,/r) = 0 and no systematic 
relation between G and H with cI(^g,h) ~ partial isomorphic 

graphs in figure 11.3 we obtain (for graph 11. 3. a as G and graph 11. 3. b as 

H) 



Vg = 42 Vh = 72 (Vg < Vh) 

Ng = 7 Nh = 9 {Ng < Nh) 

Vgh = 30 Nqh = 6 



resulting in the difference between G and H 



d{G,H) = 1 “ 



30 -k 6 
72 -k 9 



0.57. 



The value of d(G,H) = 0-57 indicates that the size of the common subgraph 
of G and H is a little more than half of the size of the larger graph H. Of 
course, this absolute value of the distance between two problems is highly 
dependent on the kind of representation of their structures. Thus, for task 
analysis only the ordinal information of these values should be considered. 



11.2 Experiment 1 

Experiment 1 was designed to investigate the suitability of the water- 
redistribution for studying analogical transfer in problem solving, to get some 
initial information about transfer of isomorphic vs. non-isomorphic sources, 
and to check for possible interaction of superficial with structural similarity. 

The problems were constructed in such a way that it is highly improbable 
that the correct optimal operator-sequence can be found by trial-and-error 
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Fig. 11.3. Graphs for the equations 2 • a; + 5 = 9 (a) and 3 • a: + (6 — 2) = 16 (b) 



or that the general solution principle is immediately inferable. To investi- 
gate analogical transfer, information about mapping of source and target can 
be given before the target problem has to be solved (Novick & Holyoak, 
1991). This can be done by pointing out the relevant properties of a problem 
(conceptual mapping) and by giving information about the corresponding 
jugs in source and target (“numerical” mapping). Additionally, information 
about the problem solving strategy of subjects can be obtained by analyzing 
log-files of subjects’ problem solving behavior and by testing mapping after 
subjects solved the target problem. 

To get an indication of the degree of structural similarity between a 
source and a target which is necessary for transfer success, an isomorphic 
source/target pair was contrasted with a partial isomorphic source/target 
pair with “moderately high” structural overlap. This should give us some 
information about the range of source/target similarities which should be 
investigated more closely (in experiment 2). 

To control possible interactions of structural and superficial similarity, we 
discriminate structure preserving and structure violating variants of target 
problems (Holyoak & Koh, 1987): For a given source problem with three jugs 
(see fig. 11.2), a target problem with four jugs clearly changes the superifical 
similarity in contrast to a target problem consisting also of three jugs. But this 
additional jug might or might not result in a change of the problem structure - 
reflected in the sequence of pour operations necessary for solving the problem. 
In contrast, other superficial variations - like changing the sequence of jugs 
from small/medium/large to large/medium/small - are structure preserving, 
but clearly affect superficial similarity. If the introduction of an additional 
jug does not lead to additional deviations from the surface appearance of the 
source problem, we regard the surface as “stable”. As a consequence, there 
are four possible source/target variations: structure preserving problems with 
stable or changed surface and structure violating problems with stable or 
changed surface. 
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11.2.1 Method 

Material As source problem, the problem given in figure 11.2 was used. 
We constructed five different redistribution problems as target problems (see 
appendix CC.7): 

Problem 1: a three jug problem solvable with four operators which is isomor- 
phic to the source problem (condition isomorph/no surface change), 
Problem 2: a three jug problem solvable with four operators which is iso- 
morphic to the source problem, but has a surface variation by switching 
positions of the small jug (A) and the medium jug (B) and renaming 
these jugs accordingly (A ^ B, B ^ A) (condition is omorph/ small sur- 
face change), 

Problem 3: a three jug problem solvable with four operators which is iso- 
morphic to the source problem, but has a surface variation by switching 
positions of all jugs (A^B,B^C,C^A) (condition is omorph /large 
surface change). 

Problem 4: a four jug problem solvable with five operators which has a mod- 
erately high structural overlap with the source (condition partial iso- 
morph/no surface change), and 

Problem 5: a four jug problem solvable with five operators which is isomorph 
to problem 4, but has a surface variation by switching positions of two 
jugs {A ^ B, B ^ A) {condition partial is omorph/ small surface change). 

Because of the exploratory nature of this first experiment, we did not in- 
troduce a complete crossing of structure and surface similarity. The main 
question was, whether subjects could successfully use a partial isomorph in 
analogical transfer. 

In addition to the source and target problems, an “initial” problem which 
is isomorphic to the source was constructed. This problem was introduced 
before the presentation of the source problem for the following reasons: first, 
subjects should become familiar with interacting with the problem solving 
environment (the experiment was fully computer-based, see below); and sec- 
ond, subjects should be “primed” to use analogy as a solution strategy, by 
getting demonstrated how the source problem could be solved with help of 
the initial problem. 

Subjects Subjects were 60 pupils (31 male and 29 female) of a gymnasium 
in Berlin, Germany. Their average age was 17.4 years (minimum 14 and 
maximum 19 years). 

Procedure The experiment was fully computer based and conducted at 
the school’s PC-cluster. The overall duration of an experimental session was 
about 45 minutes. All interactions with the program were recorded in log-files. 
One session consisted of the following parts: 

Instruction and Training. First, general instructions were given, informing 
about the following tasks and the water-redistribution problems. Subjects 
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were introduced to the setting of the screen-layout (graphics of the .jugs) 
and the handling of interactions with the program (performing a pour- 
operation, un-doing an operation). 

Initial problem. Afterwards, the subjects were asked to solve the initial prob- 
lem with tutorial guidance (performed by the program) . In case of correct 
solution the tutor asked the subject to repeat it without any error. The 
tutor intervened, if subject had performed four steps without success, or 
had started two new attempts to solve the problem, or if they needed 
longer than three minutes. This part was finished, if the problem was 
correctly solved twice without tutorial help. 

Source problem. When introducing the source problem, first some hints 
about the relevant problem aspects for figuring out a shortest operator- 
sequence were given (by thinking about the goal quantities in terms of 
relations to initial quantities and maximum capacities). Afterwards, the 
correspondance between the three jugs of the initial problem and the 
three jugs of the source problems were pointed out. Now, the screen of- 
fered an additional button to retrieve the solution sequence of the initial 
problem. The initial solution could be retrieved as often as the subject 
desired. To perform an operation for solving the source problem, this 
window had to be closed. Tutorial guidance was identical to the initial 
problem, but subjects had to solve the source problem only once. 

Target problem. Every subject randomly received one of the five target prob- 
lems. Again, mapping hints (relevant problem aspects, correspondance 
between jugs of the source and the target problem) were given. The 
source solution could be retrieved without limit, but, again, the subjects 
could only proceed to solve the target problem after this window was 
closed. Thereby, the number and time of reference to the source problem 
could be obtained in log-files. The subjects had a maximum of 10 minutes 
to solve the target problem. 

Mapping Control. Mapping success was controlled by a short test where sub- 
jects had to give relations between the jugs of the source and target 
problem. 

Questionnaire. Finally, mathematical skills (last mark in mathematics, sub- 
jective rating of mathematical knowledge, interest in mathematics) and 
personal data (age and gender) were obtained. 

11.2.2 Results and Discussion 

Overall, there was a monotonic decrease in problem solving success over the 
five target problems (see tab. 11.2, lines n and solved). To make sure, that 
problem solving success was determined by successful transfer of the source 
and not by some other problem solving strategy, only subjects which gave a 
correct mapping were considered and a solution was rated as transfer success 
if the generated solution sequence was the (unique) shortest solution (see 
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tab. 11.2, lines correct mapping and shortest solution). The variable “trans- 
fer success” was calculated as percentage of subjects whith correct mapping 
which generated the shortest solution (see tab. 11.2, line transfer rate). 



Table 11.2. Results of Experiment 1 



Problem 


1 


2 


3 


4 


5 


structure^ 


ISO 


ISO 


ISO 


P-ISO 


P-ISO 


surface*” 


no 


small 


large 


no 


small 


n 


12 


12 


12 


12 


12 


solved 


12 


11 


9 


8 


6 


correct mapping 


8 


12 


10 


9 


6 


shortest solution 


8 


10 


8 


8 


3 


transfer rate 


100% 


83.3% 


o 

00 


88.9% 


50% 



“ ISO = isomorph, P-ISO = partial isomorph 
no, small, large change of surface 



Log-file analysis showed, that none of the subjects who performed correct 
mapping, retrieved the source solution while solving the target problem. It 
is highly improbable that the correct shortest solution was found randomly 
or that subjects could infer the general solution principle when solving the 
initial and source problem. As a consequence, we have to assume that these 
subjects solved the target by analogical transfer of the memorized (four-step) 
source solution. 

Transfer success decreased nearly monotonicly over the five conditions. 
Exceptions were problems 3 {isomorph/high surface change) and 4 {partial 
isomorph/no surface change). The high percentage of transfer success for 
problem 4 indicates clearly, that subjects can succesfully transfer a non- 
isomorphic source problem. Even for problem 5 {partial isomorph/ surface 
change) transfer success was 50%. 

There is no overall significant difference between the five experimental 
conditions (exact 5x2 polynomial test^: P = 0.21). To control interactions 
between structural and superficial similarity, different contrasts were calcu- 
lated which we discuss in the following. 

11.2.2.1 Isomorph Structure/Change in Surface. There is no signifi- 
cant impact of the variation of superficial features between conditions 1, 2 and 
3 (exact binomial-tests: 1 vs. 2 with P = 0.225; 2 vs. 3 with P = 0.558; and 1 
vs. 3 with P = 0.168). This finding is in contrast to Reed et al. (1990). Reed 
and colleagues showed that subjects’ rating of the suitability of a problem for 
solving a given target is highly influenced by superficial attributes. However, 
these ratings were obtained before subjects had to solve the target problem. 
This indicates, that superficial similarity has a high impact on retrieval of a 

^ For this test, the result has to be tested against the number of cell-value distri- 
butions corresponding with the given row and column values. The procedure for 
obtaining this number was implemented by Knut Polkehn. 
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suitable problem but not on transfer success, as was also shown by Holyoak 
and Koh (1987) when contrasting structure-preserving vs. structure- violating 
differences. 

11.2.2.2 Change in Structure/Stable Surface. Changes in superficial 
attributes between conditions 1 and 4 (isomorph vs. partial isomorph, both 
with no surface change) respectively 2 and 5 (isomorph vs. partial isomorph, 
both with small surface change) can be regarded as stable, because the addi- 
tional jug (in condition 4 vs. 1 and condition 5 vs. 2) influences only structural 
attributes. That is, by contrasting these conditions we measure the influence 
of structural similarity on transfer success. There is no significant difference 
between condition 1 and 4 (exact binomial-tests: 1 vs. 4 with P = 0.394). But 
there is a significant difference between conditions 2 and 5 (exact binomial- 
tests: 2 vs. 5 with P = 0.039): A partial isomorph can be useful for analogical 
transfer if it shares superficial attributes with the target, but, transfer diffi- 
culty is high if source and target vary in superficial attributes - even if the 
mapping is explicitly given! 

11.2.2.3 Change in Structure/Change in Surface. The variation of 
superficial attributes between conditions 4 and 5 has a significant impact 
(exact binomial-tests: P = 0.039). As shown for the contrast of conditions 2 
and 5, if problems are not isomorphic, superficial attributes gain importance. 
Of course, this finding is restricted to the special type of source/target pairs 
and variation of superficial attributes we investigated - that is, to cases where 
the target is “larger” than the source and where jugs are always named as 
A, B, C (D), but the names can be associated with jugs of different sizes. 
In this special case, the intuitive constraint of mapping jugs with identical 
names and positions has to be overcome and kept active during transfer. 

To summarize, this first explorative experiment shows, that water redis- 
tribution problems are suitable for investigating analogical transfer - most 
subjects could solve the target problems, but solution success is sensitive to 
variations of source/target similarity. As a consequence of the interaction 
found between superficial and structural similarity, in the following, superfi- 
cial source/target similarity will be kept high for all target problems and we 
will investigate only target problems varying in their structural similarity to 
the source (i. e., with stable surface). Finally, the high solution success for 
the partial isomorph of “moderately high” structural similarity (condition 4) 
indicates, that we can investigate source/target pairs with a smaller degree 
of structural overlap. 



11.3 Experiment 2 

In the second experiment we investigated a finer variation of different types 
and degrees of structural overlap. We focused on two hypotheses about the 
influence of structural similarity on transfer: 
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(1) We have been interested in the possibly different effects of differ- 
ent types of structural overlap on transfer ~ that is target exhaustiveness 
versus source inclusiveness of problems (c. f., fig. 11.1). If one considers a 
problem structure as consisting of only relevant declarative and procedural 
information, different types of structural relations result in differences in the 
amount of both common relevant declarative and common relevant proce- 
dural information. Changing the amount of common declarative information 
requires ignoring declarative source information in the case of target exhaus- 
tiveness and additionally identifying declarative target information in the 
case of source inclusiveness. Changing the amount of common procedural 
information means changing the length of the solution (i. e., the minimal 
number of pour-operators necessary to solve the problem) . 

Thus, compared to the source solution target exhaustiveness results in 
a shorter target solution while the target solution is longer for source in- 
clusiveness. Assuming that ignoring information is easier than additionally 
identifying information (Schmid, Mercy, & Wysotzki, 1998) and assuming 
that a shorter target solution is easier to find than a longer one, we expect 
that successful transfer should be more probable for target exhaustive than 
for source inclusive problems. In line with this assumption, Reed et al. (1974) 
reported increasing transfer frequencies for target exhaustive relations, if sub- 
jects were informed about the correspondences between source and target (see 
also Reed et al., 1990). 

(2) While source inclusiveness and target exhaustiveness are special types 
of structural overlap we have also been interested in the overall impact of 
the degree of structural overlap on transfer. We wanted to determine the 
minimum size of the common substructure of source and target problem 
that makes the source useful for analogically solving the target. Or in other 
words, we wanted to measure the degree of the distance between source and 
target structures up to which the source solution is transferable to the target 
problem. 



11.3.1 Method 

Material As initial and source problem we used the same problems as in 
experiment 1 . As target problems we constructed following five different redis- 
tribution problems with constant superficial attributes (see appendix CC.7): 

Problem 1: a three jug problem solvable with three operators whose struc- 
ture was completely contained in the structure of the source problem 
(condition target exhaustiveness), 

Problem 2: a three jug problem solvable with five operators whose struc- 
ture contained completely the structure of the source problem (condition 
source inclusiveness), 

Problem 3: the partial isomorph problem used before (condition 4 in experi- 
ment 1) - a four jug problem solvable with five operators whose structure 
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completely contains the structure of the source problem; this problem 
shares a smaller structural overlap with the source than problem 2 (con- 
dition “high” structural overlap), and 

Problem 4 and 5: more four jug problems solvable with five operators that 
have decreasing structural overlap with the source; the structures of 
source and target share a common substructure, but both structures have 
additional aspects (conditions “medium” structural overlap and “low” 
structural overlap) 

For all problem structures distances to the source structure were calcu- 
lated using formula 1 (see appendix CC.7). Because of the intrinsic constraints 
of the water redistribution domain, it was not possible to obtain equi-distance 
between problems. Nevertheless, the problems we constructed served as good 
candidates for testing our hypotheses. 

To investigate the effect of the type of structural source/target relation, 
the distances of problem 1 (target exhaustive) and problem 2 (source in- 
clusive) to the source have been kept as low as possible and as similar as 
possible: dsi = 0.16 and ds 2 = 0.17. As discussed above, it can be expected, 
that target exhaustiveness leads to a higher probability of transfer success 
than source inclusiveness. 

Although problem 3 is a source inclusive problem, we used it as an anchor 
problem for varying the degree of structural overlap. Target problem 4 differed 
moderately from target problem 3 in its distance value (dgs = 0.37 vs dsA = 
0.55) while target problem 5 differed from target problem 4 only slightly 
{dsA = 0.55 vs. dsb = 0.59). Thus, one could expect strong differences in 
transfer rates between condition 3 and 4 and nearly the same transfer rates 
for conditions 4 and 5. We name problems 3, 4, and 5 as “high”, “medium” 
and “low” overlap in accordance to the ranking of their distances to the 
source problem. 

Subjects Subjects were 70 pupils (18 male and 52 female) of a gymnasium in 
Berlin, Germany. Their average age as 16.3 years (minimum 16 and maximum 
17 years). The data of 2 subjects was not logged due to technical problems. 
Thus, 68 logfiles were available for data analysis. 

Procedure The procedure was the same as in experiment 1. Each subject 
had to solve one initial problem and one isomorphic source problem first and 
was then presented one of the five target problems. 



11.3.2 Results and Discussion 

49 subjects mapped the jugs from the source to the target correctly. Thus 19 
subjects had to be excluded from analysis. Table 11.3 shows the frequencies 
of subjects who performed the correct mapping between source and target 
and generated the shortest solution sequenc, i. e., solved the target problem 
analogically (c. f., experiment 1). 
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Table 11.3. Results of Experiment 2 



Problem 

structure 


1 

target 

exhaustive 


2 

source 

inclusive 


3 

“high” 

overlap 


4 

“medium” 

overlap 


5 

“low” 

overlap 


n 


11 


10 


16 


15 


16 


correct mapping 


7 


8 


13 


9 


12 


shortest solution 


6 


7 


10 


5 


1 


transfer rate 


86^ 


88^ 


f7% 


56^ 


8^ 



11.3.2.1 Type of Structural Relation. There is no difference in solving 
frequencies between condition 1 and 2 (exact binomial test, P = 0.607). That 
is, there is no indication of an effect of the type of structural source/target 
relation on transfer success. In contrast to the findings of Reed et al. (1974) 
and Reed et al. (1990), it seems, that the degree of structural overlap has a 
much larger influence than the type of structural relation between source and 
target. Furthermore, looking only at the procedural aspect of our problems, 
we could not find an impact of the length of the required operator-sequence 
(three steps for problem 1 vs. 5 steps for problem 2) on solution success. 

A possible explanation might be that the type of structural relation has 
no effect, if problems are very similar to the source. It is clearly a topic for 
further investigation, to check whether target exhaustive problems become 
superior to source inclusive problems with increasing source/target distances. 

A general superiority of degree over type of overlap could be explained 
by assuming mapping as a symmetrical instead of an asymmetrical (source 
to target) process. Hummel and Holyoak (1997) argue that during retrieval 
the target representation “drives” the process. In contrast, during mapping 
the role of the “driver” can switch from target to source and vice versa. 
During transfer the source structure again takes control of the process. Thus, 
an interaction between these processes must lead to decreasing differences 
between effects of source inclusiveness and effects of target exhaustiveness on 
analogical transfer. 

11.3.2.2 Degree of Structural Overlap. Each problem of conditions 3 
to 5 has been solvable with at least five operators. That means, there was 
one additional operator needed compared to the source solution. Results for 
conditions 3 to 5 show a significant difference between the effects of differ- 
ent degrees of structural source/target overlap on solution frequency (exact 
3x2 test, P = 0.002). Comparing each single frequency against each other 
indicates that the crucial difference between structural distances is between 
conditions 4 and 5 (exact binomial test, conditions 3 and 4: P = 0.1; condi- 
tions 3 and 5: P < 0.001; conditions 4 and 5: P = 0.0001). 

This finding is surprising taking into account that the difference of struc- 
tural distance between conditions 3 and 4 is much larger than between con- 
dition 4 and 5 {dss = 0.37, ds 4 = 0.55, ds 5 = 0.59). A possible explanation 
is, that with problem 5 we have reached the margin of the range of structural 
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overlap where a problem can be helpful for solving a target problem. A con- 
jecture worth further investigation is, that a problem can be considered as a 
suitable source if it shares at least fifty percent of its structure with the tar- 
get! An alternative hypothesis is, that not the relative but rather the absolute 
size of structural overlap determines transfer success - that is, that a source 
is no longer helpful to solve the target, if the number of nodes contained in 
the common sub-structure gets smaller than some fixed lower limit. 



11.4 General Discussion 

In our studies we investigated only a small selection of analytically possible 
source/target relations. We did not investigate many-to-one versus one-to- 
many mappings (Spellman & Holyoak, 1996), and we only looked at target 
exhaustiveness versus source inclusiveness for problems with a large com- 
mon structure. We plan to investigate these variations in further studies. For 
source/target relations with a varying degree of structural overlap we were 
able to show that a problem is suitable as source even if it shares only about 
half of its structure with the target. A first explanation for this finding which 
goes along with models of transformational analogy is, that subjects first con- 
struct a partial solution guided by the solution for the structurally identical 
part of the solution, and than use this partial solution as a constraint for 
finding the missing solution steps by some problem solving strategy, such as 
means-end-analysis (Newell, Shaw, & Simon, 1958), or by internal analogy 
(Hickman & Larkin, 1990). 

Internal analogy describes a strategy where a previously ascertained solu- 
tion for a part of a problem guides the construction of a solution for another 
part of the same problem. For the problem domain we investigated, internal 
analogy gives no plausible explanation: The constraints used to figure out 
the solution steps for the overlapping part of the target problem are not the 
same as those used for the non-overlapping part - therefore internal analogy 
cannot be applied. A second explanation might be, that subjects try to re- 
represent the target problem in such a way that it becomes isomorphic to 
the source (Hofstadter & The Fluid Analogies Research Group, 1995; Cen- 
tner et al., 1997). Again, this explanation seems to be inplausible for our 
domain: Because the number of jugs and given initial, goal, and maximum 
quantities determine the solution steps completely, re-representation (for ex- 
ample looking at two different jugs as one jug) cannot be helpful for finding 
a solution. 

The results of the present study give some new insights about the nature of 
structural similarity underlying transfer success in analogous problem solving. 
While it is agreed upon that application of analogies is mostly influenced by 
structural and not by superficial similarity (Reed et ah, 1974; Centner, 1983; 
Holyoak & Koh, 1987; Reed et ah, 1990; Novick & Holyoak, 1991), there 
are only few studies that have investigated which type and what degree of 
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structural relationship between a source and a target problem is necessary 
for transfer success. 

Holyoak and Koh (1987) used variants of the radiation problem (Duncker, 
1945) to show that structural differences have an impact on transfer. They 
varied structural similarity by constructing problems with different solution 
constraints. In studies using variants of the missionaires-cannibales problem 
structural similarity was varied in the same way (Reed et ah, 1974; Gholson 
et ah, 1996). In the area of mathematical problem solving, typically the com- 
plexity of the solution procedure is varied (Reed et ah, 1990; Reed & Bolstad, 
1991). While in all of these studies non-isomorphic source/target pairs are 
investigated, in none of them the type and degree of structural similarity 
was controlled. Thus, the question of which structural characteristics make a 
source a suitable candidate for analogical transfer remained unanswered. 

Investigating structural source/target relations is of practical interest for 
several reasons: (1) In an educational context (cf. tutoring systems) the pro- 
vided examples have to be carefully balanced to allow for generalization 
(learning). Presenting only isomorphs restricts learning to small problem 
classes, while too large a degree of structural dissimilarity can result in fail- 
ure of transfer and thereby obstructs learning (Pirolli & Anderson, 1985). 

(2) A plausible cognitive model of analogical problem solving (Falkenhainer 
et ah, 1989; Hummel & Holyoak, 1997) should generate correct transfer only 
for such source/target relations where human subjects perform successfully. 

(3) Computer systems which employ analogical or case-based reasoning tech- 
niques (Carbonell, 1986; Schmid & Wysotzki, 1998) should refrain from ana- 
logical transfer when there is a high probability of constructing faulty solu- 
tions. Thus, situations can be avoided in which system users have to check 
- and possibly debug - generated solutions. Here information about condi- 
tions for successful transfer in human analogical problem solving can provide 
guidelines for implementing criteria when the strategy of analogical reasoning 
should be rejected in favour of other problem solving strategies. 
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In part II we discussed inductive program synthesis as an approach to learning 
recursive program schemes. Alternatively, in this chapter, we investigate how 
a recursive program scheme can be generalized from two, structurally similar 
programs and how a new recursive program can be constructed by adapting 
an already known scheme. Generalization can be seen as the last step of 
programming or problem solving by analogy. Adaptation of a scheme to a 
new problem can be seen as abstraction rather than analogy because an 
abstract scheme is applied to a new problem - in contrast to mapping two 
concrete problems. In the following, we first (sect. 12.1) give a motivation 
for programming by analogy and abstraction. Afterwards (sect. 12.2), we 
introduce a restricted approach to second-order anti-unification. Then we 
present first results on retrieval (sect. 12.3), introducing subsumption as a 
qualitative approach to program similarity. In section 12.4, generalization of 
recursive program schemes as anti-instances of pairs of programs is described. 
Finally (sect. 12.5), some preliminary ideas for adaptation are presented.^ 



12.1 Program Reuse and Program Schemes 

It is an old claim in software engineering, that programmers should write less 
code but reuse code developed in previous efforts (Lowry & Duran, 1989). 
In the context of programming, reuse can be characterized as transforming 
an old program into a new program by replacing expressions (Cheatham, 
1984; Burton, 1992). But reuse is often problematic on the level of concrete 
programs because the relation between code fragments and the function they 
perform is not always obvious (Cheatham, 1984). There are two approaches 
to overcome this problem: ( 1 ) performing transformations on the program 
specifications rather than on the concrete programs (Dershowitz, 1986), and 
( 2 ) providing abstract schemes which are stepwise refined by “vertical” pro- 
gram transformation (Smith, 1985). The second approach proved to be quite 
successful and is used in the semi-automatic program synthesis system KIDS 
(Smith, 1990, see sect. 6. 2. 2. 3 in chap. 6). 

^ This chapter is based on the previous publication Schmid, Sinha, and Wysotzki 

( 2001 ). 



U. Schmid: Inductive Synthesis of Functional Programs, LNAI 2654, pp. 311-321, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 
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In the KIDS system, program schemes, such as divide-and-conquer, local, 
and global search, are predefined by the system author and selection of an ap- 
propriate scheme for a new programming problem is performed by the system 
user. From a software-engineering perspective, it is prudent to exclude these 
knowledge-based aspects of program synthesis from automatization. Never- 
theless, we consider automatic retrieval of an appropriate program scheme 
and automatic construction of such program schemes from experience as an 
interesting research questions. 

While the KIDS system is based on deductive (transformational) pro- 
gram synthesis, the context of our own work is inductive program synthesis: 
A programming problem is specified by some input/output examples and 
an universal plan is constructed which represents the transformation of each 
possible input state of the initially finite domain into the desired output (see 
part I). This plan is transformed into a finite program tree which is folded 
into a recursive function (see part II). It might be interesting to investigate 
reuse on the level of programming problems - that is, providing a hypo- 
thetical recursive program for a new set of input/output examples omitting 
planning. But currently we are investigating how the folding step can be re- 
placed by analogical transfer or abstraction and how abstract schemes can 
be generalized from concrete programs. 

Our overall approach is illustrated in figure 1.1: For a given finite pro- 
gram (T) representing some initial experience with a problem that program 
scheme (RPS-S) is retrieved from memory for which its n~th unfolding (T-S) 
results in a “maximal similarity” to the current (target) problem. The source 
program scheme can either be a concrete program with primitive operations 
or an already abstracted scheme. The source scheme is modified with respect 
to the mapping obtained between its unfolding and the target. Modification 
can involve a simple re-instantiation of primitive symbols or more complex 
adaptations. Finally, the source scheme and the new program are generalized 
to a more abstract scheme. The new RPS and the abstracted RPS are stored 
in memory. 

After introducing our approach to anti-unification, all components of pro- 
gramming by analogy are discussed. An overview of our work on constructing 
programming by analogy algorithms is given in appendix A A. 13. 



12.2 Restricted 2nd— order Anti— unification 

12.2.1 Recursive Program Schemes Revisited 

A program is represented as a recursive program scheme (RPS) S = (G,to), 
as introduced in definition 7.1.17. The main program tg is defined over a 
term algebra 7i(A) where signature A is a set of function symbols and 
A is a set of variables (def. 7.1.1). In the following, we split A in a set of 
primitive function symbols F and a set of user-defined function names with 
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= 0 and E = FU^. A recursive equation in Q consists of a function head 
G(xi , . . . , Xm) and a function body t. The function head gives the name of the 
function G £ and the parameters of the function with X = {xi, , Xm}- 
The function body is defined over the term algebra, t G Ts(X). The body t 
contains the symbol G at least once. Currently, our generalization approach 
is restricted to RPSs where Q contains only a single equation and where to 
consists only of the call of this equation. 

Since program terms are defined over function symbols, in general, an 
RPS represents a class of programs. For program evaluation, all variables 
in X must be instantiated by concrete values, all function symbols must be 
interpreted by executable functions, and all names for user-defined functions 
must be defined in F. 

In the following, we introduce special sets of variables and function sym- 
bols to discriminate between concrete programs and program schemes which 
were generated by anti-unification. The set of variables X is divided into 
variables Xp representing input variables and variables Xau which represent 
first-order generalizations, that is, generalizations over first-order constants. 
The set of function symbols F is divided into symbols for primitive functions 
Fp with a fixed interpretation (such as +(x, y) representing addition of two 
numbers) and function symbols with arbitrary interpretation <l>au- Symbols in 
‘^au represent function variables which were generated by second-order gen- 
eralizations, that is generalizations over primitive functions. Please note, that 
these function variables do not correspond to names of user-defined functions 
<Pl 

Definition 12.2.1 (Concrete Program). A concrete program is an RPS 
over Ts{Xp) with E = Fp U That is, it contains only input variables in 
Xp, symbols for primitive functions in Fp and function calls Gi G 'T which 
are defined in Q . 

Definition 12.2.2 (Program Scheme). A program scheme is an RPS de- 
fined over T^iXpU Xau) with E = FpU<?auU<P. That is, it can contain input 
variables in Xp, symbols for primitive functions in Fp, function calls Gi G 'T 
which are defined in Q, as well as object variables y G Xau, and function 
variables x G ^au- 

An RPS can be unfolded by replacing the name of a user-defined function 
by its body where variables are substituted in accordance with the function 
call as introduced in definition 7.1.29. In principle, a recursive equation can be 
unfolded infinitely often. Unfolding terminates, if the recursive program call 
is replaced by fi (the undefined term) and the result of unfolding is a finite 
program term (see def. 7.1.19) T G Ts{Xp U Xau) with E = FpU T>au U {G}. 
That is, T does not contain names of user-defined functions Gi G 
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12.2.2 Anti-unification of Program Terms 

First order anti-unification was introduced in section 7.1.2 in chapter 7. In the 
following, we present an extension of Huet’s declarative, first-order algorithm 
for anti-unification of pairs of program terms (Huet, 1976; Lassez et al., 1988) 
by introducing three additional rules. The term that results from the anti- 
unification of two terms is their most specific anti-instance. 

Definition 12.2.3 (Instance/ Anti-Instance). Ifti...tn and u are terms 
and for each U there exists a substitution at (1 < * < n) such that U = uoi, 
then the terms U are instances of u and u is an anti-instance of the set 
{ti, . . . ,tn} ■ 

u is the most specific anti-instance of the set {ti, . . . ,t„} if, for each u' 
which is also an anti-instance of {ti, ... ,tn}, there exists a substitution 0 
such that u' = u9. 

Simply spoken, an anti-instance reflects some commonalities shared be- 
tween two or more terms, whereas the most specific anti-instance reflects all 
commonalities . 

Huet’s algorithm constructs the most specific anti-instance of two terms 
in the following way: If both terms start with the same function symbol, this 
function symbol is kept for the anti-instance, and anti-unification is performed 
recursively on the function arguments. Otherwise, the anti-instance of the two 
terms is a variable determined by an injective mapping ip. ip ensures that each 
occurrence of a certain pair of sub-terms within the given terms is represented 
by the same variable in the anti-instance. 

According to Idestam-Almquist (1993), ip can be considered a term sub- 
stitution: 

Definition 12.2.4 (Term Substitution). A finite set of the form 



l{ti,Ui)/xi, ..., {tn, Un)/Xn\ 

is a term substitution, iff \x\/ti, . . . ,Xn/tn\ and |a:i/Mi,..., are 

substitutions, and the pairs (ti,ui ), . . . , (tn,Un) are distinct. 

Huet’s algorithm only computes the most specific /irsf-order anti-instance. 
But in order to capture as much of the common structure of two programs 
as possible in an abstract scheme we need at least a second order (function) 
mapping. In contrast to first order approaches, higher order anti-unification 
(as well as unification) in general has no unique solution - that is, a notion 
of the most specific anti-instance does not exist - and cannot be calculated 
efficiently (Siekmann, 1989; Hasker, 1995). 

Therefore, we developed a very restricted second-order algorithm. Its main 
extension compared with Huet’s is the introduction of function variables for 
functions of the same arity. 
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Based on the results obtained with this algorithm, we plan careful exten- 
sions, maintaining uniqueness of the anti-instances and efficiency of calcula- 
tion. A more powerful approach to second order anti-unification was proposed 
for example by Hasker (1995), as mentioned in chapter 10. 

Our algorithm is presented in table 12.1. The Var-Term, DifF-Arity, 
and Same- Function rules correspond to the two cases of Huet’s algorithm. 
Same- Term is a trivial rule which makes the implementation more efficient. 
Our main extension to Huet’s algorithm is the Same-Arity rule, by which 
function variables y € ^au are introduced for functions of the same arity. 
Huet’s termination case has been split up into two rules (Var-Term and 
Diff- Arity) because the Diff- Arity rule can be refined in future versions 
of this algorithm to allow generalizing over functions with different arities as 
well. 



Table 12.1. A Simple Anti-Unification Algorithm 

Function Call: au(ti,t 2 ) with terms ti,t 2 G M{V, F VJ $) 

Initialization: term substitutions (p = 0 

Rules: 

— Same- Term: au(t, t) — t 

— Var-Term: au(ti,t 2 ) = y with ip = tp o \t\,t 2 /y\ (where either ti £ X and 
t 2 G Te{X) or ti G Te{X) and t 2 £ X 

— Same-Function: SLu{f{xi,...,Xn),f{x'i,...,x'n)) — 

f{a.u{xi,x[), au(®„, x'„)) 

— Same-Arity: au{f{xi, . . . ,x„), g{x'i, . . . ,x'„)) = y(au(a;i, ®'i), . . . , au(a;n, *(,)) 
with ip = (p o [/, g/xj 

— DifF-Arity: au(/(a;i , . . . , Xn), g{x'i, . . .,x'm)) = y 

with tp = (po lf(x-L,...,Xn),g{x[,...,x'^)/yj (where n m) 



Proofs of uniqueness, correctness, and termination, are given in (Sinha, 

2000 ). 

To illustrate the algorithm, we present a simple example: Let the two 
terms which are input to au(ti,t 2 ) be 

h = if{eq0{x),l, *{x, fac{p{x)))) 
t 2 = if{eq0{z),0, +{z, sum{p{z)))) 



Then, 



au(ti,t2) = if{eq0{yi),y2,X2{yi,Xi{p{yi)))) 



with 



F= lyi/{x,z), t/2/(l,0), Xi/{fac,sum), X2/(*,+)l 
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as their most specific anti-instance. The rules used are: Same-Function 
(twice), Var-Term, Same-Arity (twice), Var-Term (making use of tp), 
Same-Arity, Same- Function, and finally Var-Term (again, with p). In 
table 12.4, additional examples for anti-unification are given. 



12.3 Retrieval Using Term Subsumption 

12.3.1 Term Subsumption 

By using subsumption of anti-instances, retrieval can be based on a qualita- 
tive criterium of structural similarity (Plaza, 1995) instead of on a quantita- 
tive similarity measure. The advantage of a qualitative approach is that it is 
parameter-free. That is, there is no need to invest in the somewhat tedious 
process of determining “suitable” settings for parameters and no threshold 
for similarity must be given. 

The subsumption relation is defined usually only for clauses (see sect. 6.3.3 
in chap. 6). This definition is based either on the subset/superset relation- 
ship - which exists between clauses since they can be viewed as sets (the 
conjunction of literals is commutative) - or on the existence of a substitution 
which transforms the more general (subsuming) clause into the more special 
(subsumed) one. Since terms cannot be viewed as sets, and therefore the sub- 
set/superset relationship does not exist, our definition of term subsumption 
has to be based on the existence of substitutions only. 

Definition 12.3.1 (Term Subsumption). A term ti subsumes a term t 2 
(written t 2 < ti) iff there exists a substitution 6 such that ffO = t 2 - 

Note, that 0 must be a “proper” substitution, not a term substitution 
(def. 12.2.4). 

Substitution 0 is obtained by unifying t\ and ^ 2 - If is empty and ti t 2 , 
unification of t\ and t 2 has failed. In that case, the subsumption relation 
between t\ and t 2 is undecidable. That is, the subsumption relation does not 
define a total but a partial order over terms. For two isomorphic terms, that 
is, terms which can by unified by variable renaming only, we write t\ ~ ^ 2 -^ 

Our algorithm for retrieving the “maximally similar” RPSs from a linear 
case base is given in table 12.2. In general, the algorithm returns a set of 
possible source programs rather than a unique source due to the undecidabil- 
ity of subsumption for non-unifiable terms. Input to the algorithm is a finite 
program T. For each RPS in memory, this RPS Si is unfolded to a term 
such that it has at least the length of T and the most specific anti-instance Ui 

^ Note, that here we speak of constructing an order over terms and isomorphism 
is defined with respect to the unification of terms. Later in this chapter, we will 
discuss isomorphism in the context of anti-unification, that is generalization, of 
terms. 
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of T and Ti is calculated. This anti-instance is only inserted in the set of anti- 
instances hi if it is either subsumed (i. e., is more special) by an anti-instance 
u G 7Y or if it is not unifiable with any u £hi. Furthermore, all anti-instances 
in hi which subsume (i. e., are more general) than Ui are removed from hi. 
The algorithm returns all R.PSs Si from the case-base which are associated 
with an Ui in the final set of anti-instances hi. 



Table 12.2. An Algorithm for Retrieval of RPSs 
Given 

T - a finite program. 

CB = {i?i, . . . , Rn} - a case base which is a set (represented as a list) of RPSs 
Si. 

hi - Su set (initially empty) of most specific anti-instances. 

VRi € CB: 

— Let Ti = unfold(iSi). (unfold Si according to def. 7.1.29 until the resulting term 
Ti has at least the length of T) 

— Let Ui = au{T,Ti). (anti-unify T and T) 

— If 3 m G 7/ with Ui fa u or u < Ui then Li. 

— Vu £hi with Ui < u: hi := {hi \u)U Ui 

— Return the RPSs Si associated with all Ui G hi. 



12.3.2 Empirical Evaluation 

We presented terms to six subjects (all of whom were familiar with func- 
tional programming) asking them to identify similarities among the terms. 
All these terms were RPSs unfolded twice. We chose ten RPSs which we con- 
sidered “relevant”. The RPSs represented various recursion types, such as 
linear recursion (SUM, FAC, APPEND, CLEARBLOCK), tail recursion (MEMBER, 
revert) , tree recursion (BINOM , FIBO , LAH) , and a function calling a further 
function (GGT calling MOD). All functions are given in appendix CC.8. 

In each rating task the subjects were presented one term and a “case 
base” consisting of the other nine terms. The order, in which the case base 
was presented, varied among subjects. They were requested to choose the 
term from the case base which was most similar to the given term. 

In table 12.3, the results of this rating study are shown. 

The rightmost column shows the results of our algorithm. Only for the 
CLEARBLOCK problem (3) the algorithm yields a unique result. In all other 
cases, a set of “most similar” terms is returned. This is due to both sub- 
sumption equivalences and the fact that subsumption is not always decidable 
(see section 12.3.1). 

The RPS preferred by a majority (two-thirds or more) of the subjects 
was always among the RPSs returned by the algorithm (except for GGT). 
Moreover, there are symmetries in retrieval (e. g., FAC and SUM) that can be 
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Table 12.3. Results of the similarity rating study 



Goal RPS 


Subjects’ Choice 


Majority 


Program Returns 


REVERT 


4: 33%, 2, 3, 5, 9: 17% each 


- 


|7, 8, 10} 


SUM 


5: 83%, 10: 17% 


5 


{5,10} 


CLEARBLOCK 


6: 100% 


6 


“I6] 


GGT 


1: 67%, 6: 33% 


1 


{8,10} 


FAC 


2: 100% 


2 


{2,8,9} 


APPEND 


3: 83%, 2: 17% 


3 


{1,3,7} 


MEMBER 


10: 33%, 1, 2, 4, 6: 17% each 


- 


TbO] 


LAH 


10: 50%, 9: 33%, 6: 17% 


- 


{9,10} 


BINOM 


10: 100% 


10 


{7,8,10} 


FIBO 


9: 50%, 8: 33%, 3: 17% 


- 


{2,8,9} 



explained by a common recursion scheme of these RPSs. Generally, at least 
one RPS of the same recursion type as the goal RPS is retrieved, provided 
that there is such an RPS in the case base. 

12.3.3 Retrieval from Hierarchical Memory 

We demonstrated retrieval of concrete programs or abstract schemes for the 
case of a linear memory. Memory can be organized more efficiently by intro- 
ducing a hierarchy of programs. That is, for each pair of programs which were 
used as source and target in programming by analogy, their anti-instance is 
introduced as parent. 

For hierarchical memory organization, retrieval can be restricted to a 
subset of the memory in the following way: Given a new finite program term 

t, 

— identify the most specific anti-instance au(T, T^) for all unfolded RPSs 
which are root-nodes, 

— in the following only investigate the children of this node. 

As in the linear case, in general, there might not exist a unique minimal 
anti-instance, that is, more than one tree in the memory must be explored. 
Furthermore, retrieval from hierarchical memory is based on a monotonic- 
ity assumption: If a program term Ti is less similar to the current term T 
than a program term Tj then the children of Tj cannot be more similar to 
T than Ti and the children of Ti. While this assumption is plausible, it is 
worth further investigation. Our work on memory hierachization is work in 
progress and we plan to provide a proof for monotonicity together with an 
empirical comparison of the efficiency of retrieval from hierarchical versus 
linear memory. 
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12.4 Generalizing Program Schemes 

There are different possibilities, to obtain an RPS 5i^2 which generalizes over 
a pair of RPSs 5i and ^ 2 : 

— The generalized program term T \^2 which was constructed as au(Ti,T 2 ) 
with Ti = unfold(5i) and T 2 = unfold(52) can be folded, using our 
standard approach to program synthesis described in chapter 7. 

— The term-substitution obtained by constructing au(Ti,T 2 ), can be 
extended to Lp' = ip o [Gi, G 2 /xi, 2 ], that is, by introducing a variable 
for the names of the recursive functions. Then, for our restricted ap- 
proach dealing with functions of different arity in a first-order way, it is 
enough to apply p' to either iSi or 82 - That is, for term-substitutions 
p' = {{Ti^i,T 2 p/yi) I i = l..n} where yi e Xau U with projections 
CTi = {{Tiplvi)}, 02 = {{T 2 ,i/yi)} it holds, that cri(5i) = 02 {S 2 ). 

— The anti-unification algorithm defined above can be applied to recursive 
equations Gi{x\, . . . , x„) = ti, if the equations are reformulated as equality 
terms: (= (Gi xl ... xn) ti). 

All three possibilities have their advantages: Folding of abstracted terms 
can be used to check whether the obtained generalization really represents 
a recursive program. Applying term-substitutions is the most parsimonious 
approach to generalization. Applying anti-unification directly to recursive 
programs allows to separate reuse and learning. 

To illustrate our anti-unification algorithm, we give examples for general- 
ization of recursive equations in table 12.4. The results are edited for better 
readability. In each example, we used a function for calculating the factorial 
of a number as first instance. If factorial is anti-unified with sum, the re- 
sulting scheme preserves the complete structure of both programs. Factorial 
and sum can be considered as syntactical isomorphs, that is, we can map 
the symbols from factorial into sum in a consistent way. A similar result is 
obtained for incrementing the elements of a list. If factorial is anti-unified 
with a program for calculating the square of a number, the resulting scheme 
abstracts from the first argument of the “else” -case, which is a “constant” for 
factorial and a more complex term for the sqr. Anti-unification of factorial 
and fibonacci returns a very general scheme because the programs have very 
different structures: one versus two conditionals and a linear recursion versus 
a tree-recursion. The same is true for functions with different numbers of 
parameters, as factorial and mult. 

Because the result of anti-unification is a generalized term together with 
a term substitution p (see def. 12.2.4), the original programs can be recon- 
structed. The term substitution can be considered as representing a (very 
limited) domain for interpreting the variables in the anti-instance. For exam- 
ple, in the first anti-instance given in table 12.4 X2/l,0 restricts the inter- 
pretation of X 2 to be zero or one. If we want to obtain generalized function 
schemes from anti-unification, we keep only the anti-instances and get rid 
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Table 12.4. Example Generalizations 





fac(x) = 


if(eq0(x). 


1, 


*(x, fac(pred(x)))) 


t2- 


sum(z) = 


if(eq0(z), 


0, 


-|-(z, sum(pred(z)))) 




xi(yi) = 


if(eq0(yi). 


X2, 


X3(yi, xi(pred(t/i)))) 


ti: 


fac(x) = 


if(eq0(x). 


1, 


*(x, fac(pred(x)))) 


t2- 


incl(l) = 


if(null(l), 


nil. 


cons(succ(car (1) ) , incl(tail(l) ) ) ) 




Xi{yi) = 


if(X2(i/i), 


X3, 


Xiiys, Xi(X4(i/i)))) 


ti: 


fac(x) = 


if(eq0(x). 


1, 


*(x, fac(pred(x)))) 


t2- 


sqr(z) = 


if(eq0(z), 


0. 


+ (+(z, pred(z)), sqr(pred(z)))) 




Xi(yi) = 


if(eq0(yi). 


X2, 


Xsiys, xi(pred(i/i)))) 


ti: 


fac(x) = 


if(eq0(x). 


1, 


*(x, fac(pred(x)))) 


t2‘ 


fibo(z) = 


if(eq0(z). 


0, 


if(eq0(pred(z)), 1, 










-1- (fibo(pred(z) ) , fibo(pred(pred(z) ) ) ) ) ) 




xi(yi) = 


if(eq0(yi). 


X2, 


1/2) 


ti: 


fac(x) = 


if(eq0(x). 


1, 


*(x, fac(pred(x)))) 


t2- 


mult(u v) = 


if(eq0(u). 


0, 


+ (v, mult(pred(u), v))) 




yi = 


if(eq 0 (y 2 ). 


chiiX2{y3, yf)) 



of the term substitutions. Without constraints, the variables can now be in- 
terpreted in an arbitrary way, that is, instantiation of an anti-instance can 
result in “stupid” programs. For example, we might interpret t/i as a list, 
Xi as a truth-value, and X3 as list-constructor. In the context of automatic 
programming, however, there is always some information about the new (tar- 
get) programming problem which restricts the instantiation of a scheme. A 
planned extension of our approach is, to incorporate type-information into 
anti-unification. Thereby, it can at least be guaranteed that the instantiated 
expressions of an anti-instance are type-consistent. 



12.5 Adaptation of Program Schemes 

If an RPS RPS — S' is retrieved from memory for a current problem T this 
RPS must be adapted such that the resulting RPS defines a language to 
which T belongs (see def. 7.1.20). Adaptation can be performed by applying 
the term substitutions obtained by anti-unifying T and T — S on the given 
RPS — S. For example, anti-unification of the third unfolding of RPS fac (as 
source) and the third unfolding of sum (as new finite program with unknown 
recursive generalization) results in ip = |yi/(x, 2 ), j/ 2 /(l, 0 ), Xi/(*;+)l- The 
given RPS fac can be adapted by replacing 1 by 0 (first order) and * by -I- 
(second-order) and variable renaming x to z. 
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A more complicated case occurs, if identical symbols in one program are 
mapped to different symbols in another program. For the finite programs sub 
and add (see fig. 12.1), results = \xi / {pred, succ)\. This term substitution 
was obtained for the sub-terms at position 3. 2. A and 3. 3. A of both finite pro- 
grams while the sub-terms at positions 3.1. A are identical (pred(x)). Simply 
applying the mappings given in (f to the RPS sum results in a not intended 
(and not terminating) RPS add(x, y) = if(eqO(x), y, add(succ(x), succ(y))). 
This error can be easily detected, because the given finite program does not 
belong to the language of the constructed RPS, that is, the finite program 
cannot be generated by unfolding the RPS. To deal with such cases, the con- 
text of the substitution terms must be considered, that is, the function of 
which the sub-terms are arguments and their position (Schmid et ah, 1998; 
Hasker, 1995). 




Target: add(x, y) = if(eqO(x), y, add(pred(x), succ(y))) 
Fig. 12.1. Adaptation of Sub to Add 
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We introduced analogy and abstraction as alternative to inductive program 
synthesis by folding of initial trees. In general, analogy is not more efficient 
than folding for generalizing recursive program schemes from finite programs. 
Nevertheless, it is a method which we find interesting for two reasons: Fist, 
using anti-unification, mapping and generalization can be realized within the 
same framework and we can demonstrate (currently only with toy-examples) 
how abstract program schemes can be learned from some initial programming 
experience. Second, analogy is considered as a crucial strategy in human 
problem solving and learning. 



13.1 Learning and Applying Abstract Schemes 

Our approach to programming by analogy using anti-unification mainly ad- 
dresses the problem of obtaining abstract schemes from some concrete pro- 
gramming experience. If such schemes are acquired, a deductive approach to 
program synthesis by stepwise refinement - as proposed by (Smith, 1990) - 
can be used. Currently, mainly in the context of object oriented programing, 
a similar concept - design patterns- are discussed as approach to program 
reuse and an important contribution to productivity of software development 
(Gamma, Helm, Johnson, & Vlissides, 1996). Typically, program schemes or 
design patterns are provided by experts as aggregation and abstraction of 
their programming experience. Our research aims at getting more insight into 
how such abstractions are obtained by human programmers and at exploiting 
this insight for (partial) automation of the construction of program schemes. 
Additionally, we address the problem of retrieving a suitable scheme for a 
given new programming problem. By using subsumption of anti-instances, re- 
trieval can be based on a qualitative criterion of structural similarity (Plaza, 
1995) instead of on a quantitative similarity measure. 

Currently, our approach is only a minimal extension of first-order anti- 
unification, allowing to generalize over functions with the same arity. Gen- 
eral second-order anti-unification has the draw-back that uniqueness is lost, 
mainly because it is open which pair of arguments of two terms are anti- 
unified. To constrain second-order anti-unification, we plan to introduce func- 
tion evaluation, similar to the proposal of Kolbe and Walther (1995) in the 
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context of reusing proofs. Thereby we also extend the purely syntactical ap- 
proach to similarity and generalization by semantic aspects. Even with this 
additional information, typically not a single source candidate but a set of 
RPSs will be retrieved from memory. 

While in the system KIDS a knowledgeable user selects the RPS which 
is underlying the to be constructed program, we are interested in identifying 
heuristics for automatic selection. In a first step, candidate RPSs should be 
evaluated with respect to their structural similarity to the target RPSs. As we 
have seen in the previous chapter, it is not very helpful to transfer a scheme to 
a problem belonging to a different recursion class - such as transferring linear 
recursive factorial to tree recursive fibonacci (see tab. 12.4). For this step 
an interleaving of folding and analogy might be helpful: Information about 
hypothetical positions of recursive calls, which are known for the source RPS, 
can be superimposed on the new finite program term and it can be checked 
whether this results in a valid recursive segmentation. Additionally, criteria of 
structural overlap between finite programs, as investigated in chapter 11 can 
be employed to decide whether it is worthwhile to try analogical transfer. If 
after this initial reduction of source candidates more than one RPS remains, 
information about function types can support the final selection. Note, that 
such a strategy is complementary to cognitive science approaches where in a 
first step, candidates sharing superficial features are obtained and in a second 
step, the best mapping candidate is selected. 

Currently, we are using analogy or abstraction as alternative to folding. A 
more powerful approach would be to apply analogy already for a given prob- 
lem specification - the set of operators and top-level goals given as input to 
DPlan: Each RPS could be associated with the planning domain from which 
it was originally constructed and for a new planning domain, a structurally 
similar domain could be identified for which a recursive control program is 
already given. First ideas for analyzing planning domains with respect to 
their underlying recursive structure were proposed by Wolf (2000). 



13.2 A Framework for Learning from Problem Solving 

Now we have introduced all components of the system proposed in chapter 
1 " universal planning, inductive program synthesis, and programming by 
analogy and abstraction. The system is yet in its infancy, but we hope that 
we could demonstrate that the combination of these three areas of research 
can be fruitful for program synthesis. 

Most work in machine learning research addresses concept learning, some 
work is done in the area of skill acquisiton. These fields cover only a fraction 
of the power of human learning. We believe that our system models some 
aspects of human learning from problem solving: Exploring a new domain 
by search, inducing a recursive generalization, which represents a problem 
solving strategy, and incrementally, with growing experience over different 
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domains, constructing a hierarchy of abstract schemes. Folding of finite terms 
models generalization learning by detecting a pattern in a sequence of actions. 
Abstraction as a result of analogical problem solving models generalization 
learning by detecting a common structure between solutions of different prob- 
lems. Both approaches address one possible source of human creativity: the 
discovery of problem solving strategies - or algorithms - from some example 
experience. 



13.3 Application Perspective 

In this book we introduced basic concepts and algorithms for automatizing 
(parts of) the process of program construction and generalization learning. 
We presented learning of recursive control rules for AI planning as a field of 
application. In the future, we hope to use the ideas presented in this book as 
basis for further areas of application (which we already mentioned in chap. 1). 
We plan to apply learning of recursive control rules to proof planning. As in 
general AI planning, there is a need to guide search in automatic theorem 
proving. An obvious area of application is learning from user traces to support 
end-user programming. Nowadays, nearly everybody uses a computer but 
only a small fraction of people know how to program. There exist already 
some tools for generating “macros” from observation of users’ input behavior 
in a software system, be it a text editor, a graphic editor, or spreadsheet 
analysis. Allowing to not only to learn linear sequences of operations, which 
is what current tools provide for, but generalizing to loops (recursion) can 
make end-user support more powerful. 



Everybody [...] omits something, if only because to include everything is im- 
possible. 



— Nero Wolf in: Rex Stout Poison d la Cart, 1960 
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Appendices 




A. Implementation Details 



A.l Short History of DPlan 

The original idea of using universal planning as initial step for inductive 
program synthesis was presented in Wysotzki (1987): sets of basic programs 
and axioms are used to construct a kind of conditional plan with top-level 
goals as root-node; ordering of dependent goals was realized by backtracking 
over plan construction. From 1995 to 1997 we explored (implemented and 
tested) several strategies for generating finite programs by planning: 

— The approach proposed in Wysotzki (1987) was implemented in different 
versions by Ute Schmid, by Baback Paradian and by Olaf Brandes Paran- 
dian, Schmid, and Wysotzki (1995, see). 

— Furthermore, we investigated combining forward search with decision tree 
learning: 

As first step, optimal plans are generated for each possible initial state of a 
domain with small complexity. Initial states then are represented as feature 
vectors where the set of all different literals occurring over states are used 
as features with value 1 if this literal occurs in a state description and value 
0 otherwise. Each initial state is associated with the (or an) optimal action 
sequence for transforming it into a goal state. A decision tree algorithm 
(Unger & Wysotzki, 1981, CAL2, see) is used to generate a classification 
program. 

Transforming such a program into a finite program fit for generalization- 
to-n involves the same problems as discussed in chapter 8 along with some 
additional problems: First, the decision tree does not necessarily repre- 
sent an order over the number of transformations involved: it is possible 
that the first attribute already branches to a leaf representing the most 
complex transformation sequence. Second, there is no interleaving between 
actions and conditions which have to be fulfilled before executing these ac- 
tions which is typical for recursion (see for example the finite program for 
unload-all in chap. 8). Instead, all conditions necessary to execute the com- 
plete transformation sequence are checked first (along a path in the decision 
tree) and the transformation sequence is executed completely afterwards. 
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A possible way to split such compound actions is discussed in Wysotzki 
(1983)d 

A second forward-search approach is presented in (Briesemeister et ah, 
1996) (see sect. 2.5.2). 

— In 1998 we came up with the first version of DPlan as a state-based non- 
linear backward-planner constructing universal plans (implemented by Ute 
Schmid) . This first algorithm, its formalization, proofs of completeness and 
correctness are presented in (Schmid, 1999). DPlanl.O works for STRIPS- 
like domain specifications extended to binary conditioned effects. The gen- 
erated universal plan is a minimal spanning tree. That is, for domains with 
sets of optimal solutions only one alternative is calculated for each possible 
state. DPlanl.O was extended over the last year by several people in several 
ways: 

— Domain specification in PDDL, STRIPS -I- general conditioned effects 
(see report of the student project “Extending DPlan To an ADL-subset” 
by Janin Toussaint, Ulrich Wagner, and Hakan Mean, 1999, http : //ki . 
cs . tu-berlin . de/~schmid/mlpj 2/ ss99/mlp j 2_99 . html) . 

— Constructing universal plans without a predefined set of states (see re- 
port of the student project “Universal Planning without state sets” by 
Michael Christmann and Stefan Ronnecke, 1999, URL as above). 

— Preliminary work on plan construction using control knowledge repre- 
sented as recursive macros (see report of the student project “Planning 
with recursive macros”, Mischa Neumann and Ralf Ansorg, 1999, URL 
as above). 

— Calculating DAG’s instead of minimal spanning trees (Schmid, 1999). 

— Extending DPlan to function application, as reported in chapter 4. This 
work is based on a diploma thesis by Marina Muller, Mai 2000 (http : 
/ /ki . cs . tu-berlin.de/projects/dipl .html). 

~ Current work in progress is to extend DPlan to further features of PDDL -- 
all-quantification and negation. And investigate soundness and complete- 
ness of planning for these more general operator definitions (diploma thesis 
by Bernhard Wolf, Sept. 2000, http://ki.cs.tu-berlin.de/projects/ 
dipl . html). 

— DPlan was extended (January 2001) to include recursive program schemes 
fulfilling a certain class of goals in the domain specification. If the top- 
level goals of a new planning problem match with the goals of a recursive 
program scheme, this scheme is applied and an optimal transformation se- 
quence is constructed directly, without planning. This extention was cur- 
rently realized by Janin Toussaint as part of her diploma thesis on plan 
transformation for program synthesis (see below). 



^ This work is not documented in a paper. The documented program together with 
some examples can be obtained from Ute Schmid. 
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A. 2 Modules of DPlan 



DPlan is implemented in Common Lisp. The program together with exam- 
ple domains and further information can be obtained via http : / /ki . cs . 
tu-berlin . de/~schmid/IPAL/ dplan . html. 



— dplan. Isp: main program 

construction of universal plan, saved in plan-steps as list of pstep- 
structures; recursive calculation of pre-images, as described in table 3.1 

— plan-dstruc . Isd: global data structure pstep, see below 

— ps-back.lsp: calculating the set of legal predecessors of a state (match 
and backward operator application) 

function (apply-rules <current-state>) is called from dplan, returns 
list of new psteps; the new psteps are pruned with respect to the current 
plan 

— showplan. Isp: graphical and term output of plans 

function (show plan-steps) is called from dplan; graphs can be displayed 
with graphlet, trees can be displayed with xterm 
~ parse. Isp, selectors . Isp: handling the PDDL-input (implemented by 
Marina Muller) 

— generate . Isp, range . Isp, update . Isp: handling update effects for plan- 
ning with function application (implemented by Marina Muller) 



A plan-step is a pair ( instop, child ). Additional informations are given 
for construction of graphical outputs and for plan transformation: 



(def struct 
instop 
parent 

child 

prec 



add 

del 

nodeid 

level 



pstep 

instantiated operator, cf . puttable(A) 

parent node : in backward planning successor of instop 

input in ps-back "state" 

child node : in backward planning predecessor of instop 

constructed in ps-back 

instantiated preconditions of operator 

for conditioned operators: union of global and specific 

precondition 

instantiated add-list 

instantiated del-liste 

identifier 

level in the ms-dag 



A. 3 DPlan Specifications 

DPlan specifications are given in standard PDDL. For a complete description 
of the syntax of PDDL see (McDermott, 1998b). We give a Tower of Hanoi 
problem as example: 
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(define (domain disk-world-domain) 

(: action move 

: parameters (?d ?x ?y) 

:precondition (and(on ?d ?x) (clear ?d) (clear ?y) (smaller ?d ?y)) 
: effect 

((and ((on ?d ?y) (clear ?x)) 

(not ((on ?d ?x) (clear ?y))) 

)) 

)) 

(define (problem towers-of-hanoi) 

: domain ’disk-world-domain 

:goal ((on d3 pi) (on d2 d3) (on dl d2)) 



(define (states states-in-disk-world-domain) ; domain restriction 
: states ; complete goal state 

(((on d3 p3) (on d2 d3) (on dl d2) (clear dl) (clear p2) (clear p3) 
(smaller dl pi) (smaller d2 pi) (smaller d3 pi) (smaller d4 pi) 

(smaller dl p2) (smaller d2 p2) (smaller d3 p2) (smaller d4 p2) 

(smaller dl p3) (smaller d2 p3) (smaller d3 p3) (smaller d4 p2) 

(smaller dl d2) (smaller dl d3) (smaller d2 d3) (smaller dl d4) 

(smaller d2 d4) (smaller d3 d4) ) 

)) 

For planning with update effects we extended PDDL as described in chap- 
ter 4. An example of a Tower of Hanoi problem is: 

(define (domain disk-world-domain) 

(: action move 

: parameters (?x ?y) 

: precondition () 

: effect 

((change (?L2 in (on ?y ?L2) (to (cdr ?L2))) 

(?L1 in (on ?x ?L1) (to (cons-car ?L1 ?L2))) 

)) 

:post (((on ?x ?L1) (on ?y ?L2)) 

(fl ?L2) ; L2 is not empty 

(f6 ?L1 ?L2)) ; (car L2) < (car LI) 

)) 

(defun cons-car (11 12) 

; (print ‘(cons-car ,11 >12 , (cons (car 12) 11))) 

(cons (car 12) 11)) 

(defun f6 (11 12) 

; (print ‘ (f6 ,11 ,12)) 

(cond ((null 11) T) 

(T (> (car 11) (car 12))) 



)) 
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(define (problem towers-of-hanoi) 

: domain ’disk-world-domain 

:goal ((on pi [ ] ) (on p2 [ ] ) (on p3 [123])) 

) 



A. 4 Development of Folding Algorithms 

Starting point for our development of a folding algorithm was the approach 
by Summers (1977) and its extension from a subset of Lisp to term algebras 
by Wysotzki (1983). Both approaches rely on the identification of differences 
between pairs of traces in a given finite program. Since the original algorithm 
of Summers was no longer available and there did not exist an implementation 
for Wysotzki’s approach, we started from the scratch: 

— In the beginning (1997), a simple string pattern-matcher was realized 
in Common Lisp by Imre Szabo in his diploma thesis (http://ki.cs. 
tu-berlin.de/projects/dipl.html). This algorithm did only work for 
detecting linear recursions with independent variable substitutions in the 
recursive call. 

— At the same time, an algorithm based on the identification of unifiable 
sub-terms of a given finite program term was implemented in Common 
Lisp by Dirk Matzke in his diploma thesis (http;//ki.cs.tu-berlin. 
de/projects/dipl .html). He used the tree-transformation algorithm of 
Lu (1979) which was originally implemented in a student project where 
it was applied to matching of different finite programs in the context of 
programming by analogy (Schadler, Scheffer, Schmid, & Wysotzki, 1995). 

— In a student project in 1997, Martin Mhlpfordt and Markus Jurinka worked 
on a formalization of folding based on induction of context-free tree gram- 
mars (Miihlpfordt & Schmid, 1998) and Martin Mhlpfordt implemented 
a pattern matcher in Prolog based on this formalization. This implemen- 
tation works already for recursive equations realizing arbitrary recursion 
forms (linear, tree, combinations) and for interdependent variable substi- 
tutions. 

~ Heike Pisch (2000) provided an efficient implementation of this approach in 
her diploma thesis (http : / /ki . cs . tu-berlin.de/projects/dipl .html), 
using a different approach for generating valid hypotheses for segmenta- 
tions. 

— The preceding two projects were the starting point for the powerful folding 
algorithm realized in Common Lisp by Martin Mhlpfordt in his diploma 
thesis 2000 (http://ki.cs.tu-berlin.de/projects/dipl.html) which 
is presented in detail in chapter 7: the original approach was extended to 
RPSs with sets of recursive equations and to the identification of hidden 
variables in substitution terms. 
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A. 5 Modules of TFold 

TFold is implemented in Common Lisp (by Martin Mhlpfordt). The program 
together with example domains and further information can be obtained via 
http : / /ki . cs . tu-berlin. de/~schmid/IPAL/f old.html. 

— folder. Isp: main program. The outer control loop for search, as given in 
figure 7.13 in chapter 7. 

— Main function: search-RPS with an initial tree as input and an RPS or 
nil as output. First, it is tried to find a sub-program for the tree, if this 
fails, it is tried to find an RPS for each argument of the top- level-symbol 
of the tree. 

“ Function: search-SubProg-Backtrack implements the search for a sub- 
program by backtracking, as described in figure 7.12 in chapter 7. The 
idea is to incrementally build skeletons of the subprogram-body until a 
valid segmentation is found (a) which can be extended to the body and 
(b) so that the resulting instances of the variable can be explained by 
substitutions. 

— some test-functions (converting, folding, and printing the result) 

The initial tree is initially converted in a special representation: all (maxi- 
mal) sub-trees which do not containing “Omegas” are converted to a vector 
with 2 elements. 

— base-fkt-rps . Isp: Functions for constructing an RPS from the induced 
components (signature, head, parameters, body, substitutions) 

— base-fkt-tree . Isp: Functions for searching and partitioning trees, anti- 
unification of terms. 

— skeleton. Isp: constructing a skeleton and checking whether it is valid 

~ substitutions . Isp: Calculating the substitutions for variables in recur- 
sive calls. 

~ term2gml . Isp" : Transformation of terms into gml format for graphlet 
which provides a graphical output. 

— "read-gml . Isp" : Transformation of gml represented terms in terms as lists 
of lists. 

— pretty-rps-print . Isp: Functions for the output of RPSs. 

— unfolder . Isp: Unfolding RPSs into finite programs. 

— rps-example . Isp: Collection of RPSs. 

Currently, a graphical user interface (GUI) realized in Java for the folder 
is under construction. 

RPSs are represented as collection of sub-programs and a main program. 
Sub-programs are represented as structures. An example for ModList pre- 
sented in chapter 7 is: 

(defun ModList () 

(setq Subl (maike-sub :name ’ModList 
:params ’ (n 1) 
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(setq Sub2 



(make-rps 



: body ■ 


’ (if (empty 1) 




nill 

(cons 


(if (eqO (Mod n (head 1))) 


)) 

(make-sub :name • 


■Mod 


true 

false) 

(ModList n (tail 1)))) 


:params • 


’ (n k) 





: if ’ (g (< n k) n (Mod (- n k) k) ) 

)) 

:main ’ (ModList 8 (list 947)) 

: sigma (list Subl Sub2))) 



A. 6 Time Effort of Folding 

The time measures were performed on a SUN Sparc 20 workstation. For mea- 
surement the Common Lisp time macro was used for unfolding, the complete 
folding and the subprocesses calculating valid segmentations and determining 
substitutions. Because CPU-time measures are not overly precise, the average 
of three runs for each problem was calculated. 



A. 7 Main Components of Plan- Transformation 

Plan transformation is work in progress. The current implementation was 
realized in Common Lisp by Ute Schmid, 2000 (plan-trcuisform. Isp). Cur- 
rently, the implementation is extended and the concept of linearization based 
on the assumption of sub-goal indendence proposed in Wysotzki and Schmid 
(2001) is included by Janin Toussaint in her diploma thesis. 

The main function of the current implementation is (plantransf orm 
<plan>) : 

— Input: plan as list of psteps from dplan.lsp 
~ (decompose plan): 

initial decomposition; generation and initialization of the global variables 
subplanlist (list of tplan structures), plcuistruc (sub-plan structure as 
term of sub-plan names) 

~ (intro-type subplanlist): 

data type inferece for each sub-plan, successively filling the slots of tplan 

— (ptransform subplanlist): 

introducing situation variables and rewriting into conditional expression 

— Output: transformation information as given in subplanlist and plan- 
struc; tplan. term is passed to the generalization-to-n algorithm for each 
sub-plan; planstruc is extended from sub-plan names to arguments and 
rewritten into a “main” program 



350 A. Implementation Details 



Global data-structure: 



; input from dplan.lsp is a plan saved in plcin-steps as a list of psteps 
; global variables 

(setq subplanlist nil) ; list of all transformed subplans 
; (special case: one subplan) 

(setq planstruc nil) ; structure of global plan (subtrees replaced by 
; subplan names) 



(setq mstlist nil) ; list of possible minimal spanning trees 

; (backtrack-point for dags which are not sets!) 



new structure for transformed plan (one for each subplan) 
initial generation in decompose 
(defstruct tplein 



pname 

suplan 

coplan 

term 

ptype 

newdat 

newpred 

npf ct 

goalpred 

bottom 

passocs 

paf ct 

pparams 

pvalues 



name of plan (decompose) 

plain as structure (plan-steps) (decompose) 
plan with data types and relevant predicates 
plan as term 

type of the plan (singleop, seq, set, list) 

♦newly constructed datastructure (a list/set of objects) 

♦newly constructed predicate (if newdat =/= nil) as pattern 

♦function definition for new predicate 

goal -predicate (might be newpred) 

bottom-element (might be newdat) 

const/constr rewrite pairs 

function definition for the rewriting 

input parameters 

initial values 



newdat, newpred, npfct are not filled for ptype = singleop eind ptype = set 
newpred: p^ (... CO ...) with CD as place-holder for complex object 
maybe additional constant arguments 
f.e.: (at* CO B) for rocket 
bottom == newdat (-> newdat might be superfluous) 



A. 8 Plan Decomposition 

Please note: Extracts from the program are given in an abbreviated pseudo- 
Lisp notation, omitting implementation details! 

; call decompose with complete plan and level = 0 (root) 

(decompose plan level) == 

(mapcar $\lambda$x. (r-dec-p (get-first-op plan Iv) x Iv) 

(partition plan)) 

(get-first-op plan level) == 

get the set first operator-symbol at the upper-most level of the 
plan 

(partition plan) == for each root of ‘ ‘plan’ ’ , return its subplan 

(r-dec-p op plan Iv) == 

(let ((dlv (disag-level op plan)) 

(pname (gensym "P"))) 

(cond ((not dlv) (save-subplan pname plan)) ; single plan 

; subplan 

((> dlv Iv) (save-subplan pname (get-suplan dlv plan)) 
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(decompose (append (make-root (1- dlv) plan) 
(rest-suplan dlv plan)) 

dlv) 

) 

(T ; (= dlv Iv) ; different operators 

(save-subplan pname plan) ; at one level 
) ; currently not treated by decomposition 

)) 

(disag-level op plan) == 

returns the level in ‘ ‘plan’ ’ on which appears for the first time a 
different operator-name than ‘ ‘op’ ’ 

(save-suplan pname plan) == 

generate a new tplan-structure and instantiate the slots pname with 
‘‘pname’’ and suplan with ‘‘plan’’ (see appendix for tplan- 
structure) 

(get-suplan dlv plan) == 

return a partial plan from the current top-level to level ‘‘dlv’’ 
(make-root Iv plan) == get the leaf of the newly generated subplan 
(rest-suplan dlv plan) == 

return the remaining plan with level ‘‘dlv’’ as new top-level 



A. 9 Introduction of Situation Variables 

For each subplan: 

(rewrite-to-term coplan) == 

(list ’if (intro-s (first-state coplan)) 

’s (r-rewrite (rest coplan)) ) 

(intro-s e) == extend e by an argument ‘‘s’’ 

(r-rewrite coplan) == 

(cond ((null cp) ’omega) 

(T (append (intro-s (first-op coplan)) 

(list (list ’if 

(intro-s (first-state coplan)) 

’s (r-rewrite (rest coplan)) ))))) 



A. 10 Number of MSTs in a DAG 

To calculate the number of minimal spanning trees in an DAG, we can use 
formula Jl"=o multiplying the number of alternative choices Ci on each 
level i. For the sorting of three elements, we have: 1 • 1 • 9 = 9, with a single 
option for the root and the first level and 9 different options for the third 
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level. To calculate the number of options on a level, the connective structure 
of the DAG has to be know. The different alternatives on one level can be 
calculated as described for the construction of minimal spanning trees. For 
sorting lists with 4 elements we have: 1 • 1 • 52488 • 46656 = 2.448.880.128! 

To omit the explicit calculation of options per level, an upper bound 
estimate is 




with Gi as number of arcs from level i — lto level i and rii as number of nodes 
on level i. 

We cannot provide a formula for calculating the proportion of regular- 
izable trees to all trees, because regularizability depends on the identity of 
edge-labels which is variable between domains. 



A. 11 Extracting Minimal Spanning Trees from a DAG 



; 1st is tree until Iv (initially; root) ; sp is all plan-steps on levels > Iv 
(defun extract -trees (1st sp Iv) 

(let* ((tvec (number-of-msts 1 sp (1+ Iv))) 

(tent (reduce ’* tvec))) 

(print ‘(There are * ,tvec = ,tcnt minimal spanning trees)) 

(fresh-line) 

(cond ((> tent 576) (write-string "How many trees? <number> ") 

(setq k (read-number)) 

(fresh-line) 

(extract-msts (list 1st) sp (1+ Iv) k)) 

(T (write-string "Generate all alternatives") 

(fresh-line) 

(extract-msts (list 1st) sp (1+ Iv) tent)) 

))) 



(defun extract-msts (1st sp Iv k) 

(print ‘(Include next level ,lv)) 

(let (dp (remove-if #’(lambda(x) (/= Iv (pstep-level x))) sp)) 

(rp (remove-if #’(lambda(x) (= Iv (pstep-level x))) sp))) 

(cond ((null Ip) nil) 

((null rp) (first-k k (lift 

(mapear #’(lambda(x) (pmerge 1st x)) (all-combs Ip k))))) 
; in the last step this is only a "throw-away" of 
; already calculated trees! 

(T (extract-msts (first-k k 

(lift (mapear #dlambda(x) (pmerge 1st x)) 
(all-combs Ip k)))) 
rp (1+ Iv) k) 



))) ) 



(defun pmerge (1st e) 

(cond ((null 1st) (list e)) ; should never occur 

((flatlst 1st) (join 1st e)) 

(T (mapear #’(lambda(x) (join x e)) 1st)) 

)) 



(defun all-combs (Ip k) 

(let* ( (dc (make-sset (mapear #’(lambda(x) (pstep-child x)) Ip))) 

(Ics (child-split Ip dc k)) 

(tics (mapear #’(lambda(x) (generate-trees x (1- (length x)))) Ics))) 
(cart -product (car tics) (edr tics) k) 



)) 
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(defun child-split (Ip dc k) 

(cond ((mill dc) nil) 

(T (cons (first-k k (remove-if #’(lambda(x) 

(not (setequal (pstep-child x) (car dc)))) Ip) ) 
(child-split Ip (cdr dc) k))) 



)) 



(defun generate-trees (Ip cnt) 

(cond ((< cnt 0) nil) 

(T (setq cur- Ip (copy-plein Ip)) 

(cons (nth cnt cur-lp) 

(generate-trees Ip (1- cnt)))) 



)) 



(defun cart-product (fst rst k) 

(cond ((null rst) fst) 

(T (cart -product (first-k k (comb fst (car rst))) (cdr rst) k)) 

)) 



(defun comb (f r) 

(cond ((null f) nil) 

(T (append (mapcar #’(lambda(x) 
(comb (cdr f) r))) 



)) 



(join (car f) x)) r) 



A. 12 Regularizing a Tree 

(defun regularize-tree (mst Iv) 

(let ((cur (remove-if #’(lambda(y) (/= Iv (pstep-level y))) mst)) 

(nxt (remove-if #’(lambda(y) (/= (1+ Iv) (pstep-level y))) mst)) 

) 

(cond ((null cur) nil) 

((null nxt) mst) 

(T (let ((opsetcl (mapcar #’(lambda(x) (pstep-instop x)) cur)) 
(opsetnl (mapcar #’(lambda(x) (pstep-instop x)) nxt))) 
(cond ((subsetp opsetnl opsetcl :test ’equal) 

(regularize-tree (smerge (Iv-shift cur opsetnl) 
mst) (1+ Iv))) 

(T (regularize-tree mst (1+ Iv))) 



))) ) ) ) 

(defun Iv-shift (cur opsetnl) 

(cond ((null cur) nil) 

((member (pstep-instop (car cur)) opsetnl :test ’equal) 

(setf nc (copy-pstep (car cur))) 

(setf (pstep-parent nc) 

(id-insert (pstep-parent (car cur)))) 

(setf (pstep-level nc) (1+ (pstep-level (car cur)))) 
(cons 

(make-pstep 

: instop ’id 

: parent (pstep-parent (car cur)) 

: child (id- insert (pstep-parent (car cur))) 
:nodeid (+ 1000 (pstep-nodeid (car cur))) 

: level (pstep-level (car cur)) 

) 

(cons nc (Iv-shift (cdr cur) opsetnl))) 

) 

(T (cons (car cur) (Iv-shift (cdr cur) opsetnl))) 



)) 
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(defun id-insert (s) 

(cond ((mill s) nil) 

((idlist (car s)) (cons (cons ’id (car s)) (cdr s))) 
(T (cons ’ (id) s) ) 

)) 

(defun idlist (1) 

(cond ((null 1) T) 

((equal ’id (car 1)) (idlist (cdr 1))) 

(T nil) 

)) 



; if pstep-child of new is equal to a pstep-child in old -> keep new 
; (has higher level) 

; if pstep-child without "idlist" is equal to a pstep-child in old 
; and both are on the same level -> keep old (the one which has 
; no or a shorter "idlist") 

; ==> parent -node for the nodes in nw with pstep-child of new as 

; parent has to be set to "old" parent! 

; these nodes are still in new because of the sequence of 
; node construction in Iv-shift! 

(defun smerge (nw old) 

(cond ((null nw) old) 

((member (car nw) old :test ’node-equal) 

(smerge (cdr nw) (cons (car nw) 

(remove-if #’(lambda(x) (node-equal 
(car nw) x) ) old)))) 

((member (car nw) old :test ’level-equal) 

(smerge (old-parent (car nw) (cdr nw) old) old)) 

(T (smerge (cdr nw) (cons (car nw) old))) 

)) 

(defun node-equal (si s2) 

(and (= (length (find-if #’(lambda(x) (idlist x)) (pstep-child si))) 
(length (find-if #’(lambda(x) (idlist x)) (pstep-child s2))) 

) 

(setequal (remove-if #’(lambda(x) (idlist x)) (pstep-child si)) 
(remove-if #’(lambda(x) (idlist x)) (pstep-child s2)) 

)) ) 

(defun level-equal (si s2) 

(and (setequal (remove-if #’(lambda(x) (idlist x)) (pstep-child si)) 
(remove-if #’(lambda(x) (idlist x)) (pstep-child s2)) 

) 

(= (pstep-level si) (pstep-level s2)) 

)) 



; if smerge keeps the old state, then the new-state with cld as parent 
; has to keept the corresponding old parent 
(defun old-parent (cld si old) 

(cond ((null si) nil) 

((setequal (pstep-child cld) (pstep-parent (car si))) 

(cons (update-par (copy-pstep (car si)) 

(pstep-child (find-if #’(lambda(x) 

(level-equal cld x) ) old))) (cdr si))) 
(T (old-parent cld (cdr si) old)) 

)) 

(defun update-par (si ud) (setf (pstep-parent si) ud) si) 



A. 13 Programming by Analogy Algorithms 355 



A. 13 Programming by Analogy Algorithms 

In contrast to other approaches to programming by analogy (in computer 
science) and analogical problem solving (in cognitive science), our approach 
allows to deal likewise with analogy and abstraction: Recursive program 
schemes can be defined over a signature consisting of a set of primitive func- 
tions with fixed interpretation or over a signature containing (second-order) 
function variables. Analogy is realized by mapping concrete programs while 
abstraction is realized by mapping a second-order scheme with a concrete 
program. We use the same similarity criterium for retrieval and mapping. 

Originally, we investigated tree-transformation, currently we are investi- 
gating anti-unification as an approach to mapping. Tree-transformation re- 
turns a quantitative measure of structural similarity based on the necessary 
number of substitutions, deletions, and insertions to make a source program 
identical to a target candidate. Tree-transformation is introduced in chapter 
10 as an example for a measure of structural similarity. Anti-unification is 
described in chapter 12. 

We did some preliminary work on adaptation of non-isomorphic programs. 
Again in contrast to other approaches, our focus is on generalization over pro- 
grams, that is, on constructing a hierarchy of recursive program schemes. An 
overview is given at http://ki.cs.tu-berlin.de/~schmid/IPAL/aprog. 
html. 

Several aspects of programming by analogy and analogical problem solv- 
ing were investigated: 

~ Memory organization and retrieval using tree-transformation was investi- 
gated by Mark Muller in his diploma thesis (Aug. 1997, http : //ki . cs . 
tu-berlin.de/projects/dipl.html). He used hierarchical cluster analy- 
sis to build a hierarchical memory and could show that memory organiza- 
tion resulted in a grouping of programs sharing recursive structures (tail 
recursion, linear recursion, tree recursion). 

— In two psychological experiments, Knuth Polkehn and Joachim Wirth 
investigated the use of non-isomorphical base problems of human prob- 
lem solvers (Sept. 1997 and Mai 1998, http://ki.cs.tu-berlin.de/ 
pro jects/dipl. html). They could show that human problem solvers can 
solve new problems if base and target have a structural overlap of at least 
fifty percent. For our automatic programming system, we take this result 
as a hint that non-isomorphic sources should be considered for adapta- 
tion. But, we need a (qualitative) criterum for deciding whether the source 
candidate is “similar enough” to prefer adaptation over folding from the 
scratch. 

— Adaptation of program schemes to new problems was researched by Rene 
Mercy in his diploma thesis (April 1998, http://ki.cs.tu-berlin.de/ 
pro jects/dipl. html). He used tree-transformation to construct a map- 
ping. If the mapping only contains unique substitutions, the programs are 
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isomorphic and adaptation is realized simply by replacing the symbols in 
the base program by the symbols they are mapped upon in the target. For 
non-unique substitutions, deletions and insertions, context in the program 
tree is used to determine the adaptation of the source program (see chapter 
12 ). 

— If base and target share some similarity, but adaptation fails, it could 
be tried to rewrite the base problem to obtain a better mapping. That 
is, structure mapping is enriched by a background theory. This idea is 
currently discussed in cognitive science as “re-representation” and Martin 
Miihlpfordt investigated re-representation in a psychological experiment in 
his diploma thesis (June 1999, http : //ki . cs . tu-berlin. de/projects/ 
dipl . html). 

— In 2000 we replaced tree-transformation by (second order) anti-unification 
as backbone of programming by analogy. Thereby we gained a parame- 
ter free, qualitative approach to structural similarity which furthermore 
allowed to deal with mapping and generalization in an uniform way. Adap- 
tation can be modeled by the inverse process, that is, (second order) 
unification. Mapping and generalization was researched by Uwe Sinha in 
his diploma thesis (Aug. 2000, http://ki.cs.tu-berlin.de/projects/ 
dipl.html) and is described in chapter 12. 

— Currently, Ulrich Wagner is investigating retrieval and adaptation. 
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B.l Fixpoint Semantics 

Fixpoint theory in general concerns the following question: Given an ordered 
set P and an order-preserving map (p : P ^ P, does there exist a fixpoint 
for A point x G P is called fixpoint if <l>{x) = x. 

The following introduction of fixpoint semantics closely follows Davey and 
Priestley (1990). We first introduce some basic concepts and notations. Then, 
we present the fixpoint theorem for complete partial orders. Finally, we give 
an illustration for the factorial function. 

Definition B.1.1 (Map). (Davey & Priestley, 1990, 1.6) Let X be a set 
and consider a map f : X —f X. f assigns a member f{x) G X to each 
member x G X and is determined by its graph, graph/ = {(x, /(x)) | x G X} 
with graph/ C X x X . 

A partial map is a map a : S X where S G X. With dom a = S we 
denote the domain of a. The set of partial maps on X is denoted {X — > X) 
and is ordered in the following way: given a,T G {X X), define cf < t iff 
dom a C dom r and cr(x) = r(x) for all x G dom cr. A subset G of X x X 
is the graph of a partial map (jff Vs G X : (s,x) G G and (s, x') € G => x = x'. 

Definition B.l. 2 (Order-Preserving Map). (Davey & Priestley, 1990, 
1.11) Let P and Q be ordered sets. A map : P Q is order-preserving (or 
monotone) if x < y in P implies 4>{x) < 4>(y) in Q. 

Definition B.1.3 (Bottom Element )(Da?;ey & Priestley, 1990, 1.23,1.24) 
Let P be an ordered set. The least element of P, if such exists, is called the 
bottom element and denoted T. 

Given an ordered set P (with or without L), we form P± (called ‘P lifted”) 
as follows: Take an element 0 ^ P, construct P± = PU {0} and define < on 
P± by X < y iff X = 0 or X < y in P. 

Definition B.1.4. (Order- Isomorphism between Partial and Strict 
Maps). (Davey & Priestley, 1990, 1.29) With eachir G (S S) we associate 
a map ip{'K) : S S_\_, given by tf(Tr) = tt± where 




7t(x) if X G dom tt 
0 otherwise. 



U. Schmid: Inductive Synthesis of Functional Programs, LNAI 2654, pp. 357-367, 2003. 
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Thus Ip is a map from {S S) to (S — > S±). We have used the extra 
element 0 to convert a partial map on S into a total map. ip sets up an 
order-isomorphism between {S ^ S) and {S S±) (Davey & Priestley, 
1990, p. 21). 

Definition B.1.5 (Supremum). (Davey & Priestley, 1990, 2.1) Let P be 
an ordered set and let S Q P. An element x £ P is an upper bound of S if 
s < X for all s G S. X is called least upper bound of S if 

1. X is an upper bound of S, and 

2. X < y for all upper bounds y of S. 

Since least elements are unique, the least upper bound is unique if it exists. 
The least upper bound is also called the supremum of S and is denoted sups', 
or \J S (“join of S”), or [J S if S is directed. 

Definition B.1.6 (CPO). (Davey & Priestley, 1990, 3.9) An ordered set 
P is a CPO (complete partial order) if 

1. P has a bottom element _L, 

2. supD exists for each directed sub-set D of P. 

The simplest example of a directed set is a chain, such as 0 < smcc(O) < 
succ(succ(0)) < succ(succ(succ(0))) < . . .. Fixpoint theorems are based on 
ascending chains. 

Definition B.1.7 (Continuous and Strict Maps). (Davey & Priestley, 
1990, 3.14) A map (f : P —f Q is continuous, if for every directed set D in 
P: 4>{\_\D) = LJ((/'(D)). Every continuous map is order preserving (Davey & 
Priestley, 1990, 3.15). 

A map (( : P —f Q such that 4>{-L) = E is called strict. 

Definition B.1.8 (Fixpoint). (Davey & Priestley, 1990, 4 - 4 ) Let P be an 
ordered set and let <P : P ^ P be a map. We say x G P is a fixpoint of T> 
if <P(x) = X. The set of all fixpoints of <P is denoted by fix(^); it carries the 
induced order. The least element o/fix(<?), when it exists, is denoted by 
The n-fold composite, of a map \ P ^ P is defined as follows: is the 

identity map if n = 0 and for n>l.If<Lis order preserving, 

so is . 

Theorem B.1.1 (CPO Fixpoint). (Davey & Priestley, 1990, 4-5) Let P 
be a complete partial order ( CPO), let : P ^ P be an order-preserving map 
and define a = Ura>o 

1. If a G fix{<P), then a = /r(^). 

2. If (L is continuous then the least fixpoint exists and equals a. 
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Proof (CPO Fixpoint). (Davey & Priestley, 1990, 4.5) 

1. Certainly _L < '^(-L). Applying the order preserving map we have 

<?"(_L) < <^”+i(_L) for all n. Hence we have a chain _L < ^(_L) < . . . < 
^"(_L) < ^"+i(_L) < ... in P. Since P is a CPO, a = Un>o^"(-L) 
exists. Let /3 be any fixpoint of <P. By induction, = (3 for all n. We 

have _L < /3, hence we obtain <?"(_L) < <?"(/?) = /3 by applying The 
definition of a forces a < (3. Hence if a os a fixpoint then it is the least 
fixpoint. 

2. It will be enough to show that a G fix(^). We have 

^(U„>o ^"(-L)) = Ura>o {since is continuous) 

= Un>o^”(-*-) {since _L < ^”(_L) for all n). 

Consider the factorial function facu(a;) = a;! with its recursive definition: 
f \ f 1 if X = 0 

To each map / : Afo —>■ Afo we may associate a new map / given by: 

f{x) = I ^ if x = 0 

\ x-f{x — l) ifx>l. 

The equation satisfied by facu can be reformulated as <?(/) = /, where 
<?(/) = /. The entire factorial function cannot be unwound from its recursive 
specification in a finite number of steps. However we can, for each n = 0, 1, . . ., 
determine in a finite number of steps the partial map /„ which is a restriction 
of facu to {0, 1, ... , n}. The graph of /„ is {(0, 1), (1, 1), . . . , (n, n!)}. There- 
fore, to accommodate approximations of facu we can consider all partial 
maps on Afo and regard / as having the domain {0}U{fc|A:— iG dom/}. 
Similarly, we can work with all maps from Afo to {Afo)± and take / = T 
precisely when f{k — 1) = T. The factorial function can be regarded as a 
solution of a recursive equation <?(/) = /, where / € {Afo Afo) or equiva- 
lently can be regarded as a solution of a recursive equation 'P{f) = /, where 
/ G {Afo — > (A/o)j_): 

j!\i \ f 1 if X = 0 

^(■^)(*) “ I a;-/(a;-l) ifx>l. 

The domain P oiF as given in the fixpoint theorem above, is {Afo Aq)- 
That is, <P maps partial maps on partial maps. The bottom element of {Afo 
Afo) is 0. This corresponds to that map in {Afo (A/o)_l) which sends every 
element to T. We denote this map by T. 

It is obvious that <P is order preserving: We have graph ^(T) = {(0, 1)}, 
graph = {(0, 1), (I, I)} and so on. An easy induction confirms that 
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fn = ^"(-L) for all n where {fn} is the sequence of partial maps (“Kleene 
sequence”) which approximate facu. Taking the union of all graphs, we see 
that map facu : a; — > a;! on Ao- This is the least fixpoint 

of <l> in (A/q ^ A/q)- 



B.2 Proof: Maximal Subprogram Body 

Theorem 7.3.1 states that if a recursive subprogram G exists for a set of 
initial trees Tinu = {te \ e € E} which can generate all € Tinu for a given 
initial instantiation j3e, then there also exists a subprogram G' such that the 
instantiations (over all segments and all examples) do not share a non-trivial 
pattern (a common prefix). Therefore, it is feasible to generate a hypothesis 
about the subprogram body for a given segmentation with recursion points 
Urec by calculating the maximal pattern of the segments. 

Let Urec = pos{tc,G) be the set of recursion points in G. Because G 
is a recursive subprogram, holds Urec ^ 0- Let Urec be indexed over R = 
{1, . . . , |[/rec|} and let be W the set of unfolding indices over R (def. 7.1.28). 
Let the body of the subprogram G he tc = tc[Urec ^ G°]- That is, the 
recursive calls are replaced by function variable G with arity 0, and therefore, 
the substitutions in the recursive calls are not given in to. Let sub : X x R ^ 
Ts{X) be the substitution term of variable Xi G X in the r-th recursive call 
(r € R) with sub(a;i,r) = tG|«rO* (def. 7.1.27). 

For better readability we introduce unfolding indices w GW and example 
e G E as parameters of instantiation P : X xW x E ^ T^. We will use vari- 
able names with indices to indicate their position. If position is not relevant, 
variables are written without indices. 

The instantiation P ■. X xW x E ^ Ts of a variable x G AT in an unfolding 
w gW YD. example e G E is defined by the initial instantiation of this variable 
and by the substitution terms for this variable. In the first unfolding w = X, 
P is the initial instantiation for the given example. Further instantiations can 
be generated by substitutions: 

Definition B.2.1 (Generating Instantiations). From instantiations in 
unfolding w of an example e, the instantiation in an unfolding w o r can 
be generated by application of the substitution: 

1. /3(x, A,e) = Pei,x). 

2. For all w G W, r G R, e = inE: 

P{x, w o r,e) = sub(x, r){xi <— P{x\,w, e), . . . , x„ <— P{xn,w, e)}. 

There can exist substitutions for x with sub(x,r) G X. 

Lemma B.2.1 (Identical Instantiations). If for x G X exists a substitu- 
tion sub(x,r) = x' , r G R, x' G X, it holds for all e G E: Ww G W3v G W 
with P{x', w, e) = P{x, v, e) . 
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Proof (Identical Instantiations). If for an r G R holds that sub(x,r) = x' 
with x' G X then it follows for all e G E and w G W: 

/3{x, w or,e) = sub(a;, r){xi <— P{xi,w, e), . . . , x„ <— /3(x„,,w, e)} 

= x'{xi ^ /3(xi,w, e),...,Xn ^ /3(xn, w, e)} 

= /3(x',w,e), 

and therefore for each e G E and w GW exists v GW {v = w or) such that 
/3{x',w,e) = P{x,w,e). 

From lemma B.2.1 follows: If a substitution of a; is a replacement of x by 
another variable x' then in each example the instantiation of x' is also an 
instantiation of x. In general, for x can exist several substitutions where x 
is replaced by x' and additionally, for x' there can be further substitutions 
sub(a;',r) G X. We define the set of instantiation generating variables for x 
and generalize lemma B.2.1. 

Definition B.2. 2 (Instantiation Generating Variables). We define 
Xc{x) as the minimal set of variables which directly generate instantiations 
of X for which holds: 

1. X G Xc(x). 

2. If x' G Xfix) then {sub(x',r) | r G i?, sub(x', r) G X} C Xfix). 

Lemma B.2. 2 (Continuation of Identical Instantiations). Let Xfix) 
he the set of variables which directly generate instantiations of x. For all 
x' G Xc(x) and all e G E holds: Ww G W3v G W with fi{x',w,e) = fi{x,v,e). 

Proof (Continuation of Identical Instantiations). For the given definition of 
Xc{x) it must be shown that (1) lemma B.2. 2 holds for x, and (2) if x' G Xc(x) 
then lemma B.2. 2 holds for all x" G {sub(a;',r) | r G R,sub(x',r) G V}. 

1. For X G Xc(x) holds: for all e G if and w G W exists v G W with v = w 
such that j3(x, w, e) = /3{x, v, e). 

2. Let x' G Xc{x). By lemma B.2. 2 holds for each x" G {sub(a;',r) | r G 
R, sub(a;', r) G X} that for each e G E and w G W exists a v' GW with 
(3{x",w,e) = /3{x',v',e). Because x' G Xfix), for each /3{x',v',e) must 
exist av GW with )3{x' , v' , e) = (3{x, v, e). Therefore, for each e G E and 
w gW exists V gW with (3{x'',w,e) = fi{x,v,e). 

From lemma B.2. 2 follows: In each example is each instantiation of a 
variable from Xfix) an instantiation of x. If there exists a non-trivial pattern 
p of instantiations of x, then it must hold p < /3{x, w, e) for all w G IF and 
e G E. Based on lemma B.2. 2 for all x' G Xfix) holds: 

Definition B.2. 3 (Non- Trivial Pattern of an Instantiation). For all 

x' G Xc(x) holds for a non-trivial pattern p of instantiations of x: 
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1- P < Pe(x') for all e G E, and 

2. p < sub(a;', r){a;i ^ (i{x\,w,e), . . . ,Xn ^ /3(a;„, w, e)} for all e G E, 
w G W, and r G R with sub(a;',r) ^ X. 

Furthermore, there must exist a non-trivial pattern p' such that for all x' G 
Xc{x) holds: 

3. p' < Pe(x') for all e G E, and 

4- p' < sub(a;',r) for all r G R with sub(a;',r) ^ X. 

The proof of theorem 7.3.1 is divided in the two cases (1) p' does not 
contain variables, and (2) p' does contain variables. 

(1) p' does not contain variables. From var(p) = 0 follows 

— p' = (3e{x') for all e G E, and 

— p' = sub(a;',r) for all r G i? with sub(x',r) ^ X, and therefore 

— p' = l3{x' , w, e) for all x' G Xc{x), e G E, w G W. 

That means, that instantiations of x' G Xc{x) are identical in all unfold- 
ings over all examples. Therefore, it holds for a\\ w G W and e G E: 
tc{x' <— (3{x',w,e)} = to{x' <— p'}. Consequently, variables x' G ^c(a;) 
can be replaced by p' in the program body. 

Furthermore, for all variables x" G X, e G E, w G W, and r G R holds: 

(3{x",wor,e) =sub(a;",r) {x' ^ !3{x' ,w,e) \ x' G Xc{x)} 

\x' <— /3(x', w,e) I x' G X \ Xc{x)} 

= sub(x",r) \x' <— p' I x' G Xc(x)} 

{x' <— /3(x', w,e) I x' G X \ Xc{x)}. 

That is, in all substitutions the variables from Xc{x) can be replaced by p' 
and these new substitutions generate identical instantiations in all unfoldings 
over all examples. 

Let X' = X\ Xc{x) and m = |X'|. Because |Xc(x) > 0 holds \X'\ < |X|. 
Let be i\,...,im the indices of variables in X' . There can be constructed a 
new subprogram G' which does not need the variables from Xc{x)\ 

Head: G"(xij , . . . , Xi,„). 

Recursive Calls: Each recursive call uses the new program nam G' and the 
new substitutions. The call at position Ur G Urec, r G R is: 

G'( sub(xji , r){x' ^ p I x' G Xc{x)}, 



sub(xi^,r){x' ^ p I x' G Xc{x)}. 

Maximal Pattern of Body: The maximal patterns of the instantiations can 
be included in the body: tc = to{x' <— p' | x' G Xc{x)}. 
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Body: The program body can be constructed as: 



to' = to {x' ^ p' I 
[Ur ^ G'{ 



x' e Xc(x)} 

sub(xii,r){a;' ^ p \ x' € Xc{x)}, 



sub(a;ii,r){a;' ^ p \ x' e ^c(a;)}) | G Urec]- 

The initial instantiations of variables x' G Xc{x) are no longer used in the 
examples, because these variables do not belong to the new subprogram. The 
initial instantiations of the remaining variables in X' are unaffected. With the 
reduced initial instantiations the new subprogram still has the same language 
for each example. 

(2) p' does contain variables. Let |var(p')| = m the number of variables 
in p'. For each variable Xj G Xc(x) new variables Xji,...,Xjrn are con- 
structed. The new variable set is X' = {xjk \ xj G Xc{x), k = 1, . . . , m} U 
X \ Xc{x). For each Xj G Xc{x) for all e G E and w G W there 
must exist terms I3{xji,w,e),...,l3{xjrn,w,e) G such that f3{xj,w,e) = 
p'[P{xji , w, e), . . . , !3{xjm,w, e)] holds - because p' is a non-trivial pattern of 
each P{xj,w, e). 

Furthermore, for each Xj G Xc{x) and each r G R with sub(a;',r) ^ 
X there must exist terms sub(a;ji, r), . . . , sub(a;j„i, ?■) G TE such that 
sub(a;j,r) = p'[sub(a;ji, r), . . . , sub(a;jm, »')] holds - because p' is a non- 
trivial pattern of each sub(a;j,r). 

For all w G W, e G E, and r G R holds 



P{xj,w o r, e) 



It follows for k 
f3{xjkW o r, e) = 



=p' 


[P{xji,wor,e), 


. . . ,/3(Xjrn,W or, 


e)] 






= 


sub(a;j, r){a;i ^ 


- /3(xi,w,e),..., 


Xn 


^ (3{xn,w,e 


)} 


=p' 


[sub(a;ji,r), . . . 


,sub(a;j„i,T’)]{a:i 




P{xi,w,e), 






Xn ^ I3{xn,w,e 


)} 










=p' 


[sub(xji,r){a;i 


^ ^(xi, 


w,e),... 


,X, 


T, ^ (3{Xn,W, 


re)} 




sub(a;j„i,T’){a;i 


^ (3{xi, 


w,e),... 


,x, 


, 1 ^ fj{Xn,W, 


e)|] 


= 1,.. 


. ,m, w G W, e G 


E, and 


r G R: 








mathbfsub{xjk,r){xi 


^ P{xi, 


w,e),... 


. , X. 


n ^ P{Xn,W 


,e)| 



Consequently, the instantiations of all variables xjk can be generated by the 
(new) substitutions sub(a;jfe,r) in all unfoldings for all examples. 
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For all x' G X' , w G W, e € E, and r G R holds: 



P{x' , w or,e) 



= sub(a;', r) 
= sub(a;', r) 
= sub(a;', r) 



= sub(a;', r) 



{xj ^ f3{xj,w,e) I Xj G Xc{x)} 

{xj ^ P{xj,w, e) \ Xj € X \ Xc{x)} 

{xj ^ p'[!3{xji,w,e), . . . , !3{xjm,w,e)\ \ Xj G Xc{x)} 
{xj ^ P{xj,w, e) I Xj G X \ Xc{x)} 

{Xj < p \xjl, . . . , Xjm\ I Xj G Xc{x)} 

{Xjk ^ P(xji,w, e) I Xj G Xc{x), k = 1, . . . ,m} 

{xj ^ ^{xj,w, e) \ Xj gX\ X^x)} 

{Xj < P [xjl, . . . , Xjm] I Xj G Xc{x)} 

{x” ^ 0{x'',w,e) I x” G X'}. 



Consequently, substitutions can be modified in the following way: Each 
variable Xj G Xc{x) is replaced by the pattern p', in which variables are 
replaced by the new variables Xji, . . . , Xjm- All substitutions are terms in 
Ts(X') and do not refer to variables in Xc(x). 

Furthermore, for all w G W, e G E holds: 



ic 


{Xj ^ 


- P{xj,w,e) 


1 Xj G 


Xc{x)} 






{xj e- 


- P{xj,w,e) 


1 Xj G 


X\X,{x)} 




to 


{Xj e- 


-p'[f3{xji,w 


,e),... 


.,/3{xjm,w,e)] 1 


Xj € (^) } 




{Xj e- 


- P{xj,w,e) 


1 Xj G 


A\X,(x)} 




to 


{Xj ^ 


-p'[Xji,..., 


Xjm] 1 


Xj G 






{Xjk 


^ P{Xji,W,( 


s) 1 Xj 


G Xc{x),k = 1, 


. . . ,m} 




{Xj e- 


- P{xj,w,e) 


1 Xj G 


X\X,(x)} 




to 


{Xj e- 


-p'[Xji,..., 


Xjm] 1 


Xj G Xq (x) } 






{x'e- 


- /3(x', w, e) 


1 x' G X'}. 





If in the body tc variables Xj G Xc{x) are replaced by pattern p' which 
is instantiated with the new variables Xji , . . . , Xjm then the same unfoldings 
are generated with instantiations f3{x',w,e) of variables x' G X' . 

Consequently, a new subprogram can be constructed in an analogous way 
as in case 1: 

— Replace variables Xj G Xc(x) by pattern p' which is instantiated with 
variables Xji, , Xjm 

in the body, and 

in all substitution terms. 

— Delete the substitutions for the obsolete variables from Xc{x) in all calls. 

— Introduce the substitutions for the new variables in all calls. 

— Construct the parameter list of the program head consistent with variables 
X' used in the recursive calls. 

The initial instantiations for the new variables result from P(xj,w,e) = 
p'[l3{xji , w, e), . . . , !3{xjm,w, e)] (introduced above). The initial instantiations 
for variables from Xc{x) are no longer necessary. With the reduced initial in- 
stantiations the new subprogram still has the same language for each example. 
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Consequences. If instantiations of a variable x G X in a, subprogram share a 
non-trivial pattern p' then a new - maximized - subprogram with new initial 
instantiations can be constructed. By moving p' to the program body, the 
number of nodes (symbols) in the initial instantiations gets reduced (case 

1) . If a variable of the new subprogram has instantiations which share a 
non-trivial pattern, a new subprogram can be constructed iteratively (case 

2) . Because the number of nodes in the initial instantiation is finite and is 
reduced in each iteration, after a finite number of steps a subprogram is 
constructed together with initial instantiations such that the instantiations 
of each variable do not share a non-trivial pattern. 
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For the proof of theorem 7.3.4, the following lemma is needed: 

Lemma B.3.1 (No Common Non- Trivial Pattern for Variables). 

Let tg € Ts{Xi) be a term, such that for Xj € Xi and r G R holds: \/w S W : 

Pwor — tg\Xi < fdw (^1 ) ; ■ ■ • ; ^rrii * i^rrii ) } ■ 

L For all u with u G holds 

2. if there exists a variable Xk G Xi with: \/w G W : /3wor(xj)lu = /3w(xk), 
and 

3. Vu' < umXwi,W 2 G W : node{l3wior{xj) , u') = node{Pjg^or{xj),u') 
then tg\u = Xk- 

Proof (No Common Non-Trivial Pattern for Variables). Assume there exists 
a position u which together with variable Xk G X fulfills properties (1), (2), 
and (3) stated in lemma B.3.1, but for which does not hold ts\u = Xk. We 
must consider two cases: u G pos(tg) and u ^ pos(tg). 

Case 1: u G pos(ts) 



l.a: Let ts\u = Xh and Xh ^ Xk. Then holds for all w G TV: 

/ 3 ™or(a:j)|ii = Pwixk) {because of lemma B. 3 . 1 , 2 ) 

Pvjor{xj)\u = P-w{xh) {by assumption) 



and therefore, for all w G W : (3w{xk) = Pw{xh)- Because U (that is, 
the body of subprogram Gi, see theorem 7.3.4) is a maximal pattern, 
it must hold Xk = Xh which is a contradiction to the assumption, 
l.b: Let ts|u ^ Xi. Then holds for all u> G IF: 



because 



!3wor{Xj)\u — !3w{Xk) 

Pu!Or{Xj) — ts(xi < (3w{x\) , . . . , 
Pn!Or{Xk) ~ ts(Xi < j3ui{x\) , . . . , 
fUmori^Xk) — ^ {3w{x\), . . 



{because of lemma B. 3 . 1 , 2 ) 
Xtu, ^ j 3 w{Xni^)} 

Xrrii ^ fi-w{Xmi)y\u 
. , Xrrij^ ^ fiw (^Trij ) } 

{because u G pos(ts)). 
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Because by assumption is not in Xi, ta\u must be a non-trivial 
pattern of instantiations f3w{xk) of Xk- This is a contradiction to the 
assumption that ti is a maximal pattern of unfoldings over (3. 

Case 2: u ^ pos(ts) 

Because u is a position of all Pwor{xj) and by assumption 

Pwor (^Xj ) — ts (^ 1 ) 5 ■ • • 7 Xrrii ^ Pw iXm\ ) } 

there must exist a position Ug < u with Ug G pos(tg) and tg\u^ € Xi. Let 
tg\ue = Xh- Then for &\\w GW holds: 



Pwor (^Xj ) 1 ^^ 



— < P'g; (xi ) j . . . , Xm^ < (^Xjrii ) } Ills 

— Ills {^1 ^ (^1 ) 7 ■ ■ ■ 7 Xrrii ^ Pw iXirii ) } |tts 

{because Ug G pos(ts)) 

— Xh^Xi < Pw (^1)7 ■ ■ ■ 7 X^i ^ Pw {XfUi ) } |us 

{because tsU, = Xh) 



— Pw {Xh) • 



By assumption, equation (3) in lemma B.3.1 holds for all u' < u and 
therefore also for Ug < u: for s3\ w GW 

node(/?^,^ or ) 7 ^s) — 5aode(/3xj^2or ) 7 ^s) 

node{Pw^{xh),A) = node{Pw 2 {xj), A). 

Consequently, instantiations of variable Xh share a non-trivial pattern, 
which is a contradiction to the assumption that U is a maximal pattern 
of unfoldings over p. 

Now we can proof theorem 7.3.4: 

Proof (Uniqueness of Substitutions) . Let tg G T^{X) be a term, which fulfills 
equation { ) Ww G W . Pwor{xj^ — tg(x\ ^ Pw{x\f . . . 7X77^^ ^ Pw{xjjiP) 
given in theorem 7.3.4. We must distinguish the cases that (1) tg has a variable 
at position u and (2) The subtree in tg at position u does not contain a 
variable. 

Case 1: Let u G pos(ts) with tg\u G Xi and let tg\u = Xk- 

Then holds: Pwor{xj)\u = Pw{xk) and because of (*) that for all u' < 
u and all wi,W 2 G W node{Pwi{xj),u') = node(/ 3 .u, 2 (xj) 7 m'). be- 
cause of lemma B.3.1 holds sub(xj 7 r)|M = tg\u and for all u' < u: 
node(sub(xj, r), u') = node(ts,u'). 

Case 2: Let u G pos{tg) with tg\u G Ts- 
Then holds: 

Pwor{Xj )|t7 — < Pw (xi ) , . . . 7 Xrrii ^ Pw {Xrrii ) } |u 

— ^ Pw (^ 1)7 • ■ • 7 Xjfi ^ < Pw {Xrrii ) } 

— lit- 
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2. a: Let u G pos(sub(xj, r)). Let u' G pos(sub(a;j, r)) be a position with 
u < u' and let sub(a;j, r)|„' G Xi with sub(a;j,r) = Xk- Then holds 
for all w G W : !3wor{xj)\u' = betaw{xk) and because !3wor{xj)\u = 
it holds fiwor{xj)\u' = ts\u' and therefore Pw{xk) = tsU'- 
The instantiations of Xk then must all be identical and share the 
non-trivial pattern ts\u'- This is a contradiction to the assumption 
that ti is a maximal pattern of imfoldings over /3. It follows: if u G 
pos(sub(xj, r)) then term sub(xj,r)|„ cannot contain variables and 
it must hold: 

Pwor {Xj ) — Sub(Xj , r) 1^ ^ {xi );■•■; Xjyi^ < j3-u} {Xrm ) } 

= sub(a;j,r)j„ = ts|„. 

2.b: Let u ^ pos(sub(a;j, r)). Because u is a position in all Pwor{xj), in 
sub(a;j,r) at position u' < u must be a variable Xk G Xi. Because 
then holds for all w G IT : l3wor{xj)\u’ = Pw{xk) and for all u” < u 
and wi,W 2 G IT node{(iyaior{xj),u") = node{(3^.^or{xj),u") and 
because of lemma B.3.1 must hold for tg that ts\u' = Xk- Because 
u' < u this is a contradiction to the assumption u G pos(t). 

Because of case 1 holds for all positions u G pos(ts) with 3m' : fs|„/ G 
Xi and u < u' that node(tg,M) = node(sub(a;j, r), m). Because of case 
2 holds for all positions u G pos(ts) with ^3m' : tg\u' G Xi and u < u' 
that node(ts,M) = node(sub(a;j, r), m). Therefore, for all u G pos(ts) holds 
node(ts,M) = node(sub(a;j, r), m) and therefore holds tg = sub(a:j,r). 
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C.l Fibonacci with Sequence Referencing Function 

The fibonacci function as inferred with a genetic programming algorithm 
Koza (1992, chap. 18.3): 

(+ (SRF (- J 2) 0) 

(SRF (+ (+ (- J 2) 0) (SRF (- J J) 0)) 

(SRF (SRF 3 1) 1))) 

A realization in Lisp: 

(defun kfib (x) 

(+ (srf X (- X 2) 0) 

(srf X (+ (+ (- X 2) 0) (srf x (- x x) 0)) 

(srf x (srf x 3 1) 1)) 

) ) 

; sequence referencing function (sfr cur x def) 

; if 0 >= X <= cur then fib(x) else default def 
(defun srf (cur x def) 

(cond ((<= X 0) def) 

((<= X cur) (fib x) ) 

(T def) 

)) 



; Koza’s SRF calculates the fibonacci numbers for x = 0..20 
; in ascending order and saves them in a global table 
; we just calculate it by standard fibonacci 
(defun fib (x) 

(cond ( (= 0 x) 1) 

((= 1 x) 1) 

(T (+ (fib (- x D) (fib (- X 2)))) 

)) 

A simplification of the synthesized function: 

(defun skfib (x) 

(+ (srf X (- X 2) 0) 

(srf X (- X 1) 

(srf X (srf X 3 1) 1)) 

) ) 



U. Schmid: Inductive Synthesis of Functional Programs, LNAI 2654, pp. 369-389, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 
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C.2 Inducing ‘Reverse’ with Golem 

Trace of the ILP system Golem (Muggleton & Feng, 1990) for inducing the 

recursive clause reverse. 

Positive Examples: 

rev([] , []). 
rev([l].[l]). 
rev( [2] , [2] ) . 
rev( [3] , [3] ) . 
rev( [4] , [4] ) . 
rev([l,2] , [2,1]) . 
rev([l,3] , [3,1]) . 
rev( [1 ,4] , [4, 1] ) . 
rev([2,2] , [2,2]) . 
rev( [2,3] , [3,2] ) . 
rev( [2,4] , [4,2] ) . 
rev([0,l,2] , [2,1,0]) . 
rev([l,2,3] , [3,2,1]) . 

Negative Examples: 

rev( [!],[]). 
rev([0,l] , [0,1]) . 
rev([0,l,2],[2,0,l]). 
app([l] , [0] , [0,1]) . 

Background Knowledge: 

!- mode (rev (+,-)) . 

!- mode(app(+,+, -) ) . 
rev( [],[]). 
rev([l],[l]). 
rev( [2] , [2] ) . 
rev( [3] , [3] ) . 
rev( [4] , [4] ) . 
rev([l,2] , [2,1]) . 
rev([l,3] , [3,1]) . 
rev([l,4] , [4,1]) . 
rev([2,2],[2,2]). 
rev( [2,3] , [3,2] ) . 
rev( [2,4] , [4,2] ) . 
rev([0,l,2] , [2,1,0]) . 
rev([l,2,3],[3,2,l]). 
app( [],[],[]). 
app([l], [],[!]). 
app([2],[],[2]). 
app ( [3] , [] , [3] ) . 
app([4],[],[4]). 
app([],[l],[l]). 
app([],[2],[2]). 
app([],[3],[3]). 
app([],[4],[4]). 
app( [1] , [0] , [1 ,0] ) . 
app ([2] , [1] , [2,1]) . 
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app([2] , [1] , [2,1]) . 

app([3] , [1] , [3,1]) . 

app( [4] , [1] , [4, 1] ) . 

app([2],[2],[2,2]). 

app( [3] , [2] , [3,2] ) . 

app ( [4] , [2] , [4 , 2] ) . 

app([2,l] , [0] , [2,1,0]) . 

app([3,2],[l],[3,2,l]). 

nat (0) . 

nat ( 1 ) . 

nat (2) . 

nat (3) . 

nat (4) . 

nat (5) . 

nat (6) . 

nat (7) . 

nat (8) . 

nat (9) . 

Trace: 



[:- mode (rev (+,-)) . - Time taken 0ms] 

[:- mode (app (+,+,-)) . - Time taken 0ms] 

[Rlgg of pair:] 
rev([] , []). 
rev( [2] , [2] ) . 

[is] 

rev(A,A) app(A,A,B) . 

[Number of negatives covered=0] 

[Number of positives covered=2] 

[Rlgg of pair:] 
rev( [],[]). 
rev([l,2] , [2,1]) . 

[is] 

rev(A,B) :- app(C,D,B) . 

[Number of negatives covered=l] 

[Rlgg of pair:] 
rev([] , []). 
rev([l,3] , [3,1]) . 

[is] 

rev(A,B) :- app(C,D,B) . 

[Number of negatives covered=l] 

[Rlgg of pair:] 
rev([l],[l]). 
rev( [1 ,4] , [4, 1] ) . 

[is] 

rev( [1 1 A] , [B I C] ) :-rev(A,A), rev(C,C), app( [2, 1] , [0] , [2, 1,0] ) , 
app(A, [1] , [B I C] ) , app(D, A, [B] ) , nat(l), nat(B), rev([], 

[]), app([l] , [0] , [1,0]), app([l], [],[!]), app([2],[l], 
[2,1]), app([2],[2],[2,2]), app( [3,2] , [1] , [3,2, 1] ) , app([B], 
[],[B]), app([B],C,[B|C]), app([] , [1] , [1]), app([],[B], 

[B]), app ([],[],[]) , nat(O), nat (2). 

[Number of negatives covered=0] 
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[Number of positives covered=4] 

[Rlgg of pair:] 
rev( [2] , [2] ) . 
rev([l,3] , [3,1]) . 

[is] 

rev( [A|B] , [C|D] ) rev( [1 1 E] , [F I D] ) , rev(B,B), rev(D,D), app([G, 

A] , [H] , [G,A,H]), app(B, [A] , [CID]), app(I ,B , [C] ) , nat(A), 
nat(C), rev( [1] , [1] ) , rev( [F] , [F] ) , rev ([],[]), rev(E, 

E) , app( [1] , [0] , [1,0] ) , app([2,l] , [0] , [2,1,0]) , app([3, 

2] , [1] , [3,2, 1] ) , app( [A] , [H] , [A,H] ) , app( [A] , [] , [A] ) , 
app([A] ,1, [All]) , app( [C,F] , [J] , [C,F, J] ) , app([C],[l], 
[C,l]), app( [C] , [A] , [C, A] ) , app( [C] , [F] , [C,F] ) , app([C], 

[] , [C] ) , app ( [C] , D , [C I D] ) , app ( [C] , E , [C I E] ) , app ( [C] , 

I, [Cl I]), app( [F] , [] , [F] ) , app( [F] ,D, [F|D] ) , app([F], 
E,[F|E]), app( [G] , [A] , [G, A] ) , app( [] , [A] , [A] ) , app([], 
[C],[C]), app([],[F],[F]), app( [],[],[]), app (B , [F] , [K I 
E] ) , nat(l), nat(F), nat(G), nat(H). 

[Number of negatives covered=0] 

[Number of positives covered=2] 

[Rlgg of pair:] 
rev( [2] , [2] ) . 
rev( [2,3] , [3,2] ) . 

[is] 

rev([2|A] , [B|C]) : - rev( [1 1 A] , [D I E] ) , rev( [F I C] , [F I C] ) , rev(A, 

A), rev(C,C), app( [2] , [2] , [2,2] ) , app( [2] ,C, [2 I C] ) , app ([3, 
2] , [1] , [3,2,1]), app(A, [2] , [BIG]), app(G, A, [B] ) , nat(2), 
nat(B), rev( [1] , [1] ) , rev( [D] , [D] ) , rev( [F] , [F] ) , rev([], 
[]), rev(E,E), app( [1] , [0] , [1 , 0] ) , app( [2, 1] , [0] , [2, 1 , 

0]), app( [2] , [1] , [2, 1] ) , app( [2] , [] , [2] ) , app( [2] ,E, [2 I 
E]), app( [2] ,G, [2 |G] ) , app( [3] , [1] , [3, 1] ) , app([3],[2], 
[3,2]), app( [B,F] , [H] , [B,F,H] ) , app( [B] , [1] , [B , 1] ) , app([B], 
[2],[B,2]), app( [B] , [F] , [B,F] ) , app( [B] , [] , [B] ) , app([B], 

C , [B I C] ) , app ( [B] , E , [B I E] ) , app ( [B] , G , [B I G] ) , app ( [D] , 
[],[D]), app([D] ,C, [Die]) , app( [F] ,E, [F|E] ) , app([],[2], 
[2]), app([] , [B] , [B]), app([],[D],[D]), app( [],[],[]) , 
app(A,E,I), app(E, J, [1] ) , app(G, [D] , [B I J] ) , app(J,E, 

[1]), app(K, [B] , [3 I G] ) , app(K, A, [3] ) , nat(l), nat(3), 
nat(D), nat(F). 

[Number of negatives covered=0] 

[Number of positives covered=2] 

[Rlgg of pair:] 
rev( [3] , [3] ) . 
rev( [1 ,4] , [4, 1] ) . 

[is] 

rev( [A I B] , [C I D] ) :- rev(B,B), rev(D,D), app(B, [A] , [C I D] ) , app(E, 
B,[C]), nat(A), nat(C), rev ([],[]), app( [A] , [] , [A] ) , 
app ( [C] , [] , [C] ) , app ( [C] , D , [C I D] ) , app ( [] , [A] , [A] ) , app ( [] , 
[C],[C]), app( [],[],[]). 

[Number of negatives covered=0] 

[Number of positives covered=10] 

[Rlgg of pair:] 
rev( [1 ,4] , [4, 1] ) . 
rev( [2,3] , [3,2] ) . 
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[is] 

rev([A,B] , [B,A]) rev( [A] , [A] ) , rev( [B] , [B] ) , rev ([],[]), app([A], 
[] , [A] ) , app ( [B] , [A] , [B , A] ) , app ( [B] , [] , [B] ) , app ( [C , 

A] , [D] , [C,A,D] ) , app( [] , [A] , [A] ) , app( [] , [B] , [B] ) , app([], 
[],□), nat(A), nat(B), rev( [C] , [C] ) , app( [A] , [D] , [A, 

D] ) , app ( [C] , [A] , [C , A] ) , app ( [C] , [] , [C] ) , app ( [] , [C] , 

[C] ) , nat(C), nat(D). 

[Number of negatives covered=0] 

[Number of positives covered=6] 

[Pos-neg cover=10 , potent ial-examples=3] 

[Rlgg of :] 
rev([] , []). 
rev( [3] , [3] ) . 
rev( [1 ,4] , [4, 1] ) . 

[is] 

rev(A,B) . 

[Number of negatives covered=3] 

[Qvergeneral] 

[Rlgg of :] 
rev([0,l,2] , [2,1,0]) . 
rev( [3] , [3] ) . 
rev( [1 ,4] , [4, 1] ) . 

[is] 

rev( [A I B] , [C I D] ) rev(B,E), app(E, [A] , [C I D] ) , nat(A), nat(C), 

rev( [],[]), app( [],[],[]). 

[Number of negatives covered=0] 

[OK] 

[Rlgg of :] 
rev([l,2,3] , [3,2,1]) . 
rev( [3] , [3] ) . 
rev( [1 ,4] , [4, 1] ) . 

[is] 

rev( [A I B] , [C I D] ) rev(B,E), rev(F,D), app(E, [A] , [C I D] ) , nat(A), 

nat(C), rev ([],[]), app( [A] , [] , [A] ) , app( [] , [A] , [A] ) , 
app( [],[],[]). 

[Number of negatives covered=0] 

[OK] 

[Best-atom rev( [0, 1 , 2] , [2, 1 ,0] )] 

[Pos-neg cover=12,potential-examples=0] 

[Cover=12] 

[Reducing clause] 

[Adding clause rev( [A I B] , [C I D] ) :- rev(B,E), app(E, [A] , [C I D] ) . ] 
[REMOVED: 12 FORES, 0 NEGS] 

[FORES: 1, BACKS: 41, NEGS: 4, RULES: 1] 

[Induction time 498ms] 



C.3 Finite Program for ‘Unstack’ 

#S(TPLAN :PNAME PI 
: SUPLAN 

(#S(PSTEP : INSTOP NIL : PARENT NIL 
: CHILD 
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((CLEAR (SUCC (SUCC 03))) (CLEAR (SUCC 03)) 

(CLEAR 03)) 

:PREC NIL :ADD NIL :DEL NIL :N0DEID 0 : LEVEL 0) 
#S(PSTEP : INSTOP (UNSTACK (SUCC 03)) 

: PARENT 

((CLEAR (SUCC (SUCC 03))) (CLEAR (SUCC 03)) 

(CLEAR 03)) 

: CHILD 

((ON 02 03) (CLEAR (SUCC (SUCC 03))) 

(CLEAR (SUCC 03))) 

:PREC ((ON 02 03) (CLEAR 02)) :ADD ((CLEAR 03)) 
:DEL ((CLEAR 02) (ON 02 03)) :NODEID 1 : LEVEL 1) 
#S(PSTEP : INSTOP (UNSTACK (SUCC (SUCC 03))) 

: PARENT 

((ON 02 03) (CLEAR (SUCC (SUCC 03))) 

(CLEAR (SUCC 03))) 

: CHILD 

((ON 01 02) (ON 02 03) (CLEAR (SUCC (SUCC 03)))) 
:PREC ((ON 01 02) (CLEAR 01)) :ADD ((CLEAR 02)) 
:DEL ((CLEAR 01) (ON 01 02)) :NODEID 2 : LEVEL 2)) 

: COPLAN 

(#S(PSTEP : INSTOP NIL : PARENT NIL : CHILD ((CLEAR 03)) 

:PREC NIL :ADD NIL :DEL NIL :NODEID 0 : LEVEL 0) 
#S(PSTEP : INSTOP (UNSTACK (SUCC 03)) : PARENT ((CLEAR 03)) 

: CHILD ((CLEAR (SUCC 03))) 

:PREC ((ON 02 03) (CLEAR 02)) :ADD ((CLEAR 03)) 
:DEL ((CLEAR 02) (ON 02 03)) :N0DEID 1 : LEVEL 1) 
#S(PSTEP : INSTOP (UNSTACK (SUCC (SUCC 03))) 

: PARENT ((CLEAR (SUCC 03))) 

: CHILD ((CLEAR (SUCC (SUCC 03)))) 

:PREC ((ON 01 02) (CLEAR 01)) :ADD ((CLEAR 02)) 
:DEL ((CLEAR 01) (ON 01 02)) :N0DEID 2 : LEVEL 2)) 

:TERM 

(IF (CLEAR 03 S) 

S 

(UNSTACK (SUCC 03) S 
(IF (CLEAR (SUCC 03) S) 

S 

(UNSTACK (SUCC (SUCC 03)) S 
(IF (CLEAR (SUCC (SUCC 03)) S) S OMEGA))))) 

:PTYPE SEQ :NEWDAT NIL :NEWPRED NIL :NPFCT NIL 

:G0ALPRED (CLEAR 03) : BOTTOM 03 

:PASSOCS ((02 (SUCC 03)) (01 (SUCC (SUCC 03)))) 

:PAFCT 

(DEFUN SUCC (X S) 

(COND ((NULL S) NIL) 

((AND (EQUAL (FIRST (CAR S)) ’ON) 

(EQUAL (NTH 2 (CAR S)) X)) 

(NTH 1 (CAR S))) 

(T (SUCC X (CDR S))))) 

:PPARAMS NIL : RVALUES NIL) 
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For sequences, the slots newdat, newpred, and npfct are not required. The 
slots pparams and pvalues are filled after folding. 



C.4 Recursive Control Rules for the ‘Rocket’ Domain 

; Control Knowledge for ROCKET 



call for example (rocket ’ (ol o2 o3) ’((at ol a) (at o2 a) (at o3 a) (at rocket a))) 
or, including generation of oset 

(start -r ’((at ol b) (at o2 b) (at o3 b)) ’((at ol a) (at o2 a) (at o3 a) 

(at rocket a))) 



; predefined set-selectors 
(defun pick (oset) (car oset) ) 

(defun rst (oset) (cdr oset) ) 

; generalized predicates inferred during plan transformation 
(DEFUN AT* (ARCS S) 

(COND ((NULL ARCS) T) 

((AND (NULL S) (NOT (NULL ARCS))) NIL) 

((AND (EQUAL (CAAR S) ’AT) 

(INTERSECTION ARCS (CDAR S))) 

(AT* (SET-DIFFERENCE ARCS (CDAR S)) (CDR S))) 

(T (AT* ARCS (CDR S))))) 

(DEFUN INSIDE* (ARCS S) 

(COND ((NULL ARCS) T) 

((AND (NULL S) (NOT (NULL ARCS))) NIL) 

((AND (EQUAL (CAAR S) ’INSIDE) 

(INTERSECTION ARCS (CDAR S))) 

(INSIDE* (SET-DIFFERENCE ARCS (CDAR S)) (CDR S))) 

(T (INSIDE* ARCS (CDR S))))) 

; explicit operator application 

; in combination with DPlan, the add-del-lists are applied to s 
(defun unload (o s) 

(print ‘(unload ,o ,s)) 

(cond ((null s) nil) 

((member o (car s) :test ’equal) (cons (list ’at o ’b) (cdr s))) 

(T (cons (car s) (unload o (cdr s)))) )) 

(defun loadr (o s) 

(print ‘(load ,o ,s)) 

(cond ((null s) nil) 

((member o (car s) :test ’equal) 

(cons (list ’inside o ’rocket) (cdr s))) 

(T (cons (car s) (loadr o (cdr s)))) )) 

(defun move-rocket (s) 

(print ‘(move-rocket ,s)) 

(cond ((null s) nil) 

((equal (car s) ’(at rocket a)) (cons (list ’at ’rocket ’b) (cdr s))) 

(T (cons (car s) (move-rocket (cdr s)))) )) 

; generalized control rules 

; abstraction from destination (B) (for at*) and vehicle (Rocket) (for inside*) 
(defun unload-all (oset s) 

(if (at* oset s) 
s 

(unload (pick oset) (unload-all (rst oset) s)) )) 

(defun load-all (oset s) 

(if (inside* oset s) 
s 

(loadr (pick oset) (load-all (rst oset) s)) )) 

(defun rocket (oset s) (unload-all oset (move-rocket (load-all oset s))) ) 
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; "meta"-function, generating the set of objects to be transported 
; from the top-level goals 

(defun start-r (g s) (rocket (meike-co g ’(at* CO x)) s)) 

(defun make-co (goal newpat) 

(cond ((null goal) nil) 

((string< (string (caar goal)) (string (car newpat))) 

(cons (nth (position ’CO newpat) (car goal)) 

(make-co (cdr goal) newpat))) )) 

Interleaving ‘at’ and ‘inside’. The generalized predicates (at* oset B) and 
(inside* oset Rocket) are complements. If all objects are at a location, no 
object is inside the vehicle and vice versa. The unload-all function presup- 
poses, that all objects in oset are inside the rocket and the load-all function 
presupposes, that all objects in oset are at the current location. The constr- 
cution of a complex object from the plan is driven by the top-level goals (or 
for a sub-plan by the predicates in its root-node). After transforming both 
sub-plans into finite programs and generalizing over them, it becomes clear, 
that both sub-plans share the parameter oset whith initial value (ol o2 o3). 

Analyzing the relationship between the objects in the at *-set and the 
inside*-set, could lead to an alternative implementation of these generalized 
predicates: (at* oset 1) is true, if no object is inside the rocket, that means, 
if for (inside* oset rocket) the oset is empty (s contains no literal (inside o 
rocket))'., and analogous for (inside* oset rocket). 



C.5 The ‘Selection Sort’ Domain 

Sorting Lists with Three Elements. The universal plan for sorting lists with 
three elements is given in figure C.l. Note, that in constructing the universal 
plan, it is random whether (swap p q) or (swap q p) is the first instantiation. 
Because the operator is symmetric, both applications result in an identical 
state. Only the first application is integrated in the plan. Restriction of swap- 
ping only from smaller positions to larger ones (or the other way round) can 
be done by extending state specifications by ((gt P3 P2) (gt P3 PI) (gt 
P2 PI)) and the application-condition of the swap-operator to ((isc p nl) 
(isc q n2) (gt nl n2) (gt q p)). 




Fig. C.l. Universal Plan for Sorting Lists with Three Elements 
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Minimal Spanning Trees. The nine minimal spanning trees of the 3-sort prob- 
lem are given in figures C.2, C.3 and C.4. Three of these trees (the last three 
given) are generalizable to the selsort program: namely, all trees where the 
branching factor is as regular as possible (i. e., 3 to 2 vs. 3 to 1). 




((ISC PI 3) (ISC P2 I) (ISC P3 2)) 



Fig. C.2. Minimal Spanning Trees for Sorting Lists with Three Elements 
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TOWER — assuming subgoal independence 
abstract representation of control knowledge 
G(n,n,s) = tow(n,n)(s, puttl(n,s)) 

G(i,n,s) = tow(i ,n) (s ,put 1 (i , s (i) ,G(s (i) , s ,n) ) ) 
putl (i , s(i) , s) = put (i , s (i) , clear (i , clear (s (i) , s) ) ) 
putt 1 (i , s)=putt (i , clear (i , s) ) 

clear (x, s)=ct (x, s) (s ,putt (f (x) , clear (f (x) , s) ) ) 



main: G(l,n,s) 



call (G 1 maxblock s) 
s for maxblock = 3 



( (on 


1 


2) 
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2 


3) (ct 


1) (ont 
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3 
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(ont 
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2 
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2 
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1) 


(ont 
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((ISC PI 1) (ISC P2 2) (ISC P3 3)) I 

(SWAI PI P3) '~ISWAEKP3) 

:: 

((ISC PI 3) (ISC P2 2) (ISC P3 1)) 





(SWAI PI P3) 

i 

((ISC PI 3) (ISC P2 1) (ISC P3 2)) 



((ISC PI 2) (ISC P2 1) (ISC P3 3)) 
(SWAI P2 P3) 

!: 

((ISC PI 2) (ISC P2 3) (ISC P3 1)) 



(SWAI PI P3) (SWAEKP3) 

:: 

((ISC PI 3) (ISC P2 2) (ISC P3 1)) 



{(ISC PI 1) (ISC P2 3) (ISC P3 2)) 



(SWAI PI P3) 

:: 

((ISC PI 2) (ISC P2 3) (ISC P3 1)) 




((ISC PI I) (ISC P2 2) (ISC P3 3)) 

(SWAI PI P3) ~TJSWAE.MP3) 

!: 

((ISC PI 3) (ISC P2 2) (ISC P3 1)) 



{(ISC PI 1) (ISC P2 3) (ISC P3 2)) 
(SWAI PI P2) 

:: 

((ISC PI 3) (ISC P2 I) (ISC P3 2)) 



Fig. C.3. Minimal Spanning Trees for Sorting Lists with Three Elements 




((ISC PI 2) (ISC P2 1) (ISC P3 3)) 



((ISC PI 2) (ISC P2 3) (ISC P3 1)) 
((ISC PI I) (ISCP2 2)(ISCP3 3)) I 



((ISC PI 3) (ISC P2 I) (ISC P3 2)) 




((ISC PI 3) (ISC P2 2) (ISC P3 1)) 
(SWAI P2 P3) 

!: 

((ISC PI 3) (ISC P2 I) (ISC P3 2)) 



{(ISC PI 1) (ISC P2 3) (ISC P3 2)) 
(SWAI PI P3) 

:: 

((ISC PI 2) (ISC P2 3) (ISC P3 1)) 




((ISC PI 2) (ISC P2 3) (ISC P3 1)) 



((ISC PI 1) (ISC P2 3) (ISC P3 2)) 
(SWAI PI P2) 

:: 

((ISC PI 3) (ISC P2 I) (ISC P3 2)) 



Fig. C.4. Minimal Spanning Trees for Sorting Lists with Three Elements 
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((ct 1) (ct 2) (ct 3) (ont 1) (ont 2) (ont 3)) 
some s for maxblock > 3 

(Con 4 2) (on 2 1) (on 1 3) (ct 4) (ont 3)) 

((on 2 5) (on 5 1) (on 4 3) (ct 2) (ct 4) (ont 1) (ont 3)) 



; this would also give true if block n is ont and there are 
; unsorted blocks above it ! 

(defun tow (i n s) 

(cond ((eq i n) (cond ((member (cons 'ont (list n)) s :test ’equal) 
(print ‘(tower ,0s)) T) 

(T nil) 

) ) 

(T (cond ((and 

(member (cons ’on (list i (1+ i))) s :test ’equal) 
(tow (1+ i) n s)) T) 

(T nil) 

) ) )) 



(defun puttl (i s) (puttable i (clear i s))) 

(defun putl (i s) (put i (1+ i) (clear i (clear (1+ i) s))) ) 

(defun clear (x s) 

(cond ((member (cons ’ct (list x)) s :test ’equal) 

(print ‘(block , Odist x) is clear now)) s) 

(T (print ’puttable-call-by-clear) 

(puttable (f X s) (clear (f x s) s))) )) 

(defun put (x y s) 

(print s) 

(cond ((null s) (print ’error-in-put) nil) 

((cind (member (cons ’ct (list x)) s :test ’equal) 

(member (cons ’ct (list y)) s :test ’equal) 

) (print ‘(put , Odist x) on , Odist y) in, Os)) (exec-put x y s)) 

(T (print ’error-in-put-xy-not-clear) nil) )) 

(defun exec-put (x y s) 

(cond ((null s) nil) 

((and (equal (first (car s)) ’ont) (equal (second (car s)) x)) 

(exec-put X y (cdr s))) 

((and (equal (first (car s)) ’on) (equal (second (car s)) x)) 

(cons (cons ’ct (list (third (car s)))) (exec-put x y (cdr s))) 

) 

((and (equal (first (car s)) ’ct) (equal (second (car s)) y)) 

(cons (cons ’on (list x y)) (exec-put x y (cdr s))) 

) 

(T (cons (car s) (exec-put x y (cdr s)))) )) 

(defun puttable (x s) 

(cond ((null s) (print ’ error- in-puttable) nil) 

((member (cons ’ct (list x)) s ;test ’equal) 

(print ‘(puttable , Odist x) in ,0s)) (exec-puttable x s)) 

(T (print ’ error- in-puttable-x-not-clear) nil) )) 

(defun exec-puttable (x s) 

(cond ((null s) (print ’x-maybe-already-on-table) nil) 

((and (equal (first (car s)) ’on) (equal (second (car s)) x)) 

(cons (cons ’ct (list (third (car s)))) 

(cons 

(cons ’ont (list (second (car s)))) 

(cdr s))) 

) 

(T (cons (car s) (exec-puttable x (cdr s)))) )) 

(defun f (x s) 

(cond ((null s) nil) 
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((and (equal (first (car s)) ’on) (equal (third (car s)) x)) 

(second (car s))) 



(T (f X (cdr s))) )) 



(defun G (ins) 

(print ‘ (G ,@(list i) ,@(list n) ,0s)) 

(cond ((eq i n) (cond ((tow i n s) s) ; G(n,n,s) 

(T (puttl ns)) 

) ) 

(T (cond ((tow i n s) s) ; G(i,n,s), i < n 

(T (putl i (G (1+ i) n s))) 

) ) )) 

; TOWER-1; call e.g. (tower ’(abed) ’((on a d) (on d c) (on c b) (ct a))) 



(defun tower (olist s) 
(if (subtow olist s) 



(if (and (ct (first olist) s) (subtow (cdr olist) s)) 

(put (first olist) (second olist) s) 

(if (and (singleblock olist) (ot (first olist) s)) 

(clear-all (first olist) s) 

(if (cind (singleblock olist) (ct (first olist) s)) 

(puttable (first olist) s) 

(if (singleblock olist) 

(puttable (first olist) (clear-all (first olist) s)) 

(if (and (ct (first olist) s) (on* (cdr olist) s)) 

(put (first olist) (second olist) (clear-all (second olist) s)) 
(if (ct (first olist) s) 

(put (first olist) (second olist) (tower (cdr olist) s)) 

(put (first olist) (second olist) (clear-all (first olist) 

(tower (cdr olist) s))) 



)))))))) 



; clear-all macro 
(defun clear-all (os) 

(if (ct o s) 
s 

(puttable (succ o s) (clear-all (succ os) s ) )) ) 

(defun succ (x s) 

(cond ((null s) nil) 

((and (equal (first (car s)) ’on) 

(equal (nth 2 (car s)) x)) 

(nth 1 (car s))) 

(T (succ X (cdr s))))) 

(defun singleblock (1) (null (cdr 1))) 

; correct subtower? 

(defun subtow (olist s) (and (ct (car olist) s) (on* olist s))) 

; tower contains a currect subtower? 

(defun on* (olist s) 

(cond ((null olist) T) ; should not happend 

((and (null (cdr olist)) (ot (car olist) s)) T) 

((member (list ’on (first olist) (second olist)) s :test ’equal) 
(on* (cdr olist) s)) 

(T nil) )) 



; given 

(defun ct (o s) (member (list ’ct o) s :test ’equal)) 

; given as (ontable x) OR (here) inferred from state 
(defun ot (o s) 

(null (mapean #’(lambda(x) (and (equal ’on (first x)) 
(equal o (second x)) 

(list x))) s) )) 
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; explicit application of put-operator 
(defun put (x y s ) 

(cond ((null s) (print ^ (put ,x ,y)) 
nil) 

((equal (car s) (list ’ct y)) (cons (list ’on x y) (put x y (cdr s) ))) 
((and (equal (first (car s)) ’on) 

(equal (second (car s)) x)) 

(cons (list ’ct (third (car s))) (put x y (cdr s) ))) 
(T (cons (car s) (put x y (cdr s) ))) 

)) 

; explicit application of puttable-operator 
(defun puttable (x s ) 

(cond ((null s) (print ^(puttable ,x)) 
nil) 

((and (equal (first (car s)) ’on) 

(equal (second (car s)) x)) (cons (list ’ct (third (car s))) 

(puttable X (cdr s) ))) 

(T (cons (car s) (puttable x (cdr s) ))) 

)) 



TOWER-2 ;Building a list (tower) of sorted numbers (blocks) 



Representation: each partial tower as list 

Input: list of lists 

Examples for the three blocks world 

((1 2 3)) 

((2 3) (D) 

((1) (2) (3)) 

((2 1) (3)) 

((3 2 D) 

((1 2) (3)) 

((1 3) (2)) 

((3 1) (2)) 

((3 2) (D) 

((1 3 2)) 

((2 3 D) 

((2 1 3)) 

((3 1 2)) 



help functions 



; flattens a list 1 
(defun flatten (1) 

(cond ((null 1) nil) 

(T (append (car 1) (flatten (cdr 1)))) 

)) 



; x+1 = y? 

(defun onedif (x y) (= (1+ x) y)) 
; blocks world selectors 



; topmost block of a tower 
(defun topof (tw) (car tw)) 

; bottom block (base) of a tower 
(defun bottom (tw) (car (last tw))) 

; next tower 

; f.e. ((2 1) (3)) -> (2 1) 

(defun get -tower (1) (car 1)) 

; tops of all current towers 

(defun topelements (1) (sort (map ’list #’car 1) #’>)) 
; topblock with highest number 
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(defun greatest (1) (car (topelements 1))) 

; topblock mit second highest number 

(defun scndgreatest (1) (cadr (topelements 1))) 

; label of the block with the highest number 
(defun maxblock (1) 

(cond ((null 1) 0) 

(T (car (sort (flatten 1) #^>))) 

)) 

(defun get-all-no-towers (1 max) 

(cond ((null 1) nil) 

((and (equal (bottom (car 1)) max) (sorted (get-tower 1))) 

(get-all-no-towers (cdr 1) max)) 
((single-block (get-tower 1)) (get-all-no-towers (cdr 1) max)) 

(T (cons (car 1) (get-all-no-towers (cdr 1) max))) 

)) 

(defun find-greatest (max 1) 

(cond ((null 1) max) 

((> (topof max) (topof (car 1))) (find-greatest max (cdr 1))) 

(T (find-greatest (car 1) (cdr 1))) 

)) 



; find incorrect tower containing highest element 
(defun greatest-no-tower (1) 

(cond ((null 1) nil) 

(T (find-greatest (car (get-all-no-towers 1 
(cdr (get-all-no-towers 1 

)) 



(maxblock 1))) 
(maxblock 1))))) 



blocksworld predicates 



; is tower only a single block? 

(defun single-block (tw) (= (length tw) 1)) 

; exist two partial towers which top elements differ only by one? 

(defun exist-free-neighbours (1) (onedif (scndgreatest 1) (greatest 1))) 

; exists a correct partial tower? 

; f.e. (2 3) or (B C) 

(defun exists-tower (1) 

(cond ((null 1) nil) 

((and (equal (bottom (get-tower 1)) (maxblock 1)) 

(sorted (get -tower 1))) T) 

(T (exists-tower (cdr 1))) 

)) 



; is block x predecessor to top of a tower? 

(defun successor (x tw) 

(cond ((null tw) T) 

((onedif x (car tw)) T) ; (successor x (cdr tw))) 
(T nil) 

)) 



; is tower sorted? 

(defun sorted (tw) 

(cond ((null tw) T) 

((successor (car tw) (cdr tw)) (sorted (cdr tw))) 
(T nil) 

)) 



; exists only one tower? 

(defun single-tower (1) (null (cdr 1))) 



C.7 Water Jug Problems 383 



; goal state? 

(defun is-tower (1) (and (single-tower 1) (sorted (get-tower 1)))) 



blocksworld operators 



; put X on y 
(defun put (x y 1) 

(cond ((null 1) (print ’put) (print x) (print y) 
nil) 

((equal (caar 1) x) (cond ((not (null (cdar 1))) 

(append (list (cdar 1)) (put x y (cdr 1)))) 
(T (put X y (cdr 1))))) 

((equal (caar 1) y) (cons (cons x (car 1)) (put x y (cdr 1)))) 

(T (cons (car 1) (put x y (cdr 1)))) 

)) 

; puttable x 
(defun puttable (x 1) 

(cond ((null 1) nil) 

((equal (caar 1) x) (print ’puttable) (print x) 

(cons (list x) (cons (cdar 1) (cdr 1)))) 

(T (cons (car 1) (puttable x (cdr 1)))) 

)) 



main function 



(defun tower (1) 

(cond ((is-tower 1) 1) 

((and (exists-tower 1) 

(exist-f ree-neighbours 1)) 

(tower (put (scndgreatest 1) (greatest 1) 1))) 

(T (tower (puttable (topof (greatest-no-tower 1)) 1))) 

)) 



C.7 Water Jug Problems 

In the following we present all problems used in experiment 1 and 2. Because 
the graphs of the problem structures are large, we give the equations and 
inequations (terms) instead. For each problem, a graph can be constructed 
by representing the terms as demonstrated in figure 11.3. The terms get inte- 
grated into a single structure by using each parameter (ca, cb ■ ■ ■, <1a, <1b ■ ■ ■) 
as unique node. 



Problems for Experiment 1 



Initial Problem 



Jug 


A 


B 


C (D) 


Capacity 


28 


35 


42 - 


Initial quantity 


12 


21 


26 - 


goal quantity 


19 


0 


40 - 



Operator sequence: pour(C,B), pour(B,A), pour(A,C), pour(B,A) 
Relevant Procedural Information/Relevant Declarative Information (Con- 
straints), see Source problem 
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Source Problem 



Jug 


A 


B 


C (D) 


Capacity 


36 


45 


54 - 


Initial quantity 


16 


27 


34 - 


goal quantity 


25 


0 


52 - 



Operator sequence: pour(C,B), pour(B,A), pour(A,C), pour(B,A) 



Relevant Procedural Information 

9a(4) = 9a(0) + (cA - <7a(0)) -CA + (cs - (ca - 9a(0))) 

gs(4) = gs(0) + (CB - gs(0)) - (ca - 9a( 0)) - (cs - (ca - <7a(0))) 

9c( 4) = gc(0) - (cb - gs(0)) + ca 

Relevant Declarative Information (Constraints) 

CA = 2 • (cb — 9b(0)) 

Cb > (ca - 9a(0)) 

9c(0) > (cb - 93(0)) 

CC < <7C(0) + CA 



Problem 1 (Isomorph/No Surface Change) 



Jug 


A 


B 


C (D) 


Capacity 


48 


60 


72 - 


Initial quantity 


21 


36 


45 - 


goal quantity 


33 


0 


69 - 



Operator sequence: pour(C,B), pour(B,A), pour(A,C), pour(B,A) 

Structural distance to source: dsi = 0 

Relevant Procedural Information/Relevant Declarative Information (Con- 
straints), see Source problem 

Problem 2 (Isomorph/Small Surface Change) (Source/Target Map- 
ping: A/B, B/A, C/A) 



Jug 


A 


B 


C (D) 


Capacity 


60 


48 


72 - 


Initial quantity 


36 


21 


45 - 


goal quantity 


0 


33 


69 - 



Operator sequence: pour(C,A), pour(A,B), pour(B,C), pour(A,B) 
Structural distance to source: ds 2 = 0 
Relevant Procedural Information 

9a(4) = 9a(0) -I- (ca - <7a(0)) - (cb - gB(0)) - (ca - (cb - 9b(0))) 
9b(4) = 53(0) -I- (cb — 9b(0)) — Cb + (ca — (cb — 9b(0))) 

9c(4) = qc{0) - {ca - 9a(0)) -I- cb 
Relevant Declarative Information (Constraints) 

Cb = ‘2 ■ (ca — 9a(0)) 

CA > (cb — 9b (0)) 

9c(0) > (ca - 9a( 0)) 
cc < 9 c(0) + Cb 
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Problem 3 (Isomorph/Large Surface Change) (Source/ Tar get Map- 
ping: A/B, B/C, C/A) 



Jug 


A 


B 


C (D) 


Capacity 


72 


48 


60 - 


Initial quantity 


45 


21 


36 - 


goal quantity 


69 


33 


0 - 



Operator sequence: pour(A,C), pour(C,B), pour(B,A), pour(C,B) 
Structural distance to source: dss = 0 
Relevant Procedural Information 
9a(4) = 9a( 0) - (cc - gc(0)) -b CB 

<1b{4:) = <7b( 0) -b (cb — 9b(0)) — cb -b {cc — (cb — gB(0))) 

9c( 4) = gc(0) -b (cc - gc(0)) - (cb - 9b( 0)) - (cc - (cs - 9 b(0))) 

Relevant Declarative Information (Constraints) 

CB = 2 • (cc - <7c(0)) 

Cc > {cb — 9b (0)) 

9a( 0) > (cc - 9c(0)) 

CA < 9a( 0) -b CB 

Problem 4 (Partial Isomorph/No Surface Change) (additional jug: 
A; Source/Target Mapping: A/B, B/C, C/D) 



Jug 


A 


B 


C 


D 


Capacity 


16 


20 


25 


31 


Initial quantity 


3 


8 


15 


18 


goal quantity 


0 


13 


3 


28 



Operator sequence: pour(D,C), pour(C,B), pour(B,D), pour(C,B), 

pour(A,C) 

Structural distance to source: dsA = 0.37 

Relevant Procedural Information 
9a(5) = 9a(0) - 9a(0) 

9b(5) = 9b(0) -b (cb — 9b(0)) — cb + (cc — (cb — 9b(0))) 

9c(5) = 9c(0) -b (cc - 9c(0)) - (cb - 9b(0)) - (cc - (cb - 9b(0))) 

9b(5) = 9b( 0) - (cc - 9 c( 0)) -b Cb 
R elevant Declarative Information (Constraints) 

CB = 2 • (cc - 9c(0)) 

Cc > (cb — 9b (0)) 
cc > 9 a(0) 

9b(0) > (cc - 9c(0)) 

CD < 9b(0) -b Cb 

Problem 5 (Partial Isomorph/ Small Surface Change) (additional jug: 
A; Source/Target Mapping: A/C, B/B, C/D) 



Jug 


A 


B 


C 


D 


Capacity 


16 


25 


20 


31 


Initial quantity 


3 


15 


8 


18 


goal quantity 


0 


3 


13 


28 
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Operator sequence: pour(D,B), pour(B,C), pour(C,D), pour(B,C), 
pour(A,B) 

Structural distance to source: ds 5 = 0.37 

Relevant Procedural Information 
9a(5) = 9a(0) - (?a(0) 

9c( 5) = gc(0) + (cc - qc(0)) - cc + (cb - (cc - gc(0))) 

9b(5) = (?b(0) + (cb - 9b(0)) - (cc - 9c(0)) - (cb - {cc - 9c(0))) 

9b(5) = qaiO) — {cb — 9s(0)) + cc 
Relevant Declarative Information (Constraints) 

Cc = 2 • {cb — 9b(0)) 

Cb > (cc - 9c(0)) 

CB > 9a(0) 

9b(0) > (cb — 9b(0)) 

Cb < <7b( 0) + Cc 



Problems for Experiment 2 

(Initial problem and source problem are identical to experiment 1.) 

Problem 1 (Target Exhaustiveness) (Source/Target Mapping: A/A, 
B/B, C/C) 



Jug 


A 


B 


C (D) 


Capacity 


48 


60 


72 - 


Initial quantity 


21 


36 


45 - 


goal quantity 


0 


33 


69 - 



Operator sequence: pour(C,B), pour(B,A), pour(A,C) 

Structural distance to source: dsi = 0.16 

Relevant Procedural Information 

9a(3) = 9a( 0) + (cA - <7 a(0)) - ca 

9b(3) = qc(0) + (cb - <7b(0)) - (ca - 9a(0)) 

9c(3) = 9b(0) — (cb — <7b(0)) + ca 
Relevant Declarative Information (Constraints) 

CA = 2 • (cb — 9b(0)) 

Cb > (ca - 9a(0)) 

9c(0) > (cb - 9b(0)) 

cc < gc(0) + CA 

Problem 2 (Source Inclusiveness) (Source/ Target Mapping: A/A, B/B, 
C/C) 



Jug 


A 


B 


C (D) 


Capacity 


48 


60 


72 - 


Initial quantity 


21 


36 


45 - 


goal quantity 


33 


60 


9 - 
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Operator sequence: pour(C,B), pour(B,A), pour(A,C), pour(B,A), 
pour(C,B) 

Structural distance to source: ds 2 = 0.17 

Relevant Procedural Information 

9a(5) = 9a(0) + {cA - <7a(0)) -CA + (cB - (cA - 9a(0))) 

gs(5) = gs(0) + (cb - gs(0)) - (ca - 9a(0)) - (cb - (ca ~ <7a(0))) + cb 

9c(5) = 9c(0) — {cb — 9b(0)) + Ca — Cb 

Relevant Declarative Information (Constraints) 

CA = 2 • {cb — 9b(0)) 

Cb > (ca - 9a(0)) 

9c(0) > {cb - gs(0)) 

gc(0) < CB 
Cc < gc(0) + CA 

Problem 3 ( “High” Structural Overlap) (identical to problem 4 in ex- 
periment 1) 

Problem 4 (“Medium” Structural Overlap) (additional jug: A; Source/ 
Target Mapping: A/B, B/C, C/D) 



Jug 


A 


B 


C 


D 


Capacity 


16 


20 


25 


31 


Initial quantity 


6 


9 


15 


18 


goal quantity 


14 


14 


0 


20 



Operator sequence: pour(D,C), pour(C,B), pour(D,A), pour(B,D), 
pour(C,B) 

Structural distance to source: ds 4 = 0.55 
Relevant Procedural Information 
9a(5) = 9a(0) -b (goiO) - (cc - qc(0))) 

9b(5) = Qb(0) + (cb — 9s(0)) — Cb + (cc — {cb — 9b(0))) 

9c(5) = qc{0) + (cc - qc(0)) - (cb - 9b(0)) - (cc - (cb ~ 9s(0))) 

9d( 5) = gr)(0) - (cc - gc(0)) - (qn(0) - (cc - qc(0))) + cb 
Relevant Declarative Information (Constraints) 

CA < (9a( 0) -b 9 d(0)) 

9a(0) < (cc - gc(0)) 

Cc > (c_b — 9s (0)) 

CD > (9z>(0) + Cb 

Problem 5 (“Low” Structural Overlap) (additional jug: A; Source/Tar- 
get Mapping: A/B, B/C, C/D) 



Jug 


A 


B 


C 


D 


Capacity 


17 


20 


25 


31 


Initial quantity 


7 


9 


15 


18 


goal quantity 


16 


5 


0 


28 
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Operator sequence: pour(B,A), pour(D,C), pour(C,B), pour(B,D), 

pour(C,B) 

Structural distance to source: ds5 = 0.59 
Relevant Procedural Information 
9a(5) = 9a( 0) + gs(0) 

9b(5) = (7b( 0) — gs(0) + cb — cb + {cc — 9c(0)) 

9c(5) = qc(0) + (cc - gc(0)) - CB - (cc - 9c(0)) 

qo(5) = qn(0) - (cc - qc(0)) + Cb 

Relevant Declarative Information (Constraints) 

CB = 2 ■ (cc - qc(0)) 

CA > qA(0) + gs(0) 

cc > Cb 

9b(0) > (cc - gc(0)) 

CD < <7d(0) + Cb 



C.8 Example RPSs 



In the following the ten example programs used in the empirical evaluation 
of retrieval in chapter 12 are given. Note, that we only give the subprograms. 
The complete RPSs contain additionally a call of the according subprogram. 

#SCsub 

:NAME REVERT 
:PARAMS (LI L2) 

:B0DY (IF (NULL LI) L2 (REVERT (TL LI) (++ (HD LI) L2))) 

) 

#S(sub 

:NAME SUM 
:PARAMS (X) 

:B0DY (IF (EQO X) 0 (+ X (SUM (PRE X)))) 

) 

#S(sub 

:NAME CB 
:PARAMS (X SIT) 

:B0DY (IF (CT X) SIT (PT (TO X) (CB (TO X) SIT))) 

) 

#S(sub 

:NAME GGT 
:PARAMS (X Y) 

:BDDY (IF (EQO Y) X (GGT Y (MOD X Y))) 

) 

#S(sub 

:NAME MOD 
:PARAMS (X Y) 

:B0DY (IF (< X Y) X (MOD (- X Y) Y) ) 

) 

#S(sub 

:NAME FAC 
:PARAMS (X) 

:B0DY (IF (EQO X) 1 (* X (FAC (PRE X)))) 

) 

#S(sub 

:NAME APPEND 
:PARAMS (LI L2) 
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:BDDY (IF (NULL LI) L2 (++ (HD LI) (APPEND (TL LI) L2))) 

) 

#S(sub 

:NAME MEMBER 
:P ARAMS (XL) 

:BODY (IF (NULL L) F (IF (== X (HD L) ) T (MEMBER X (TL L)))) 

) 

#S(sub 

:NAME LAH 
:PARAMS (X) 

:BODY 

(IF (EQO X) 1 

(IF (EQO (PRE X)) 1 

(- (* (PRE (* 2 X)) (LAH (PRE X))) 

(* (* (PRE X) (PRE (PRE X))) (LAH (PRE (PRE X)))) 

) ) ) 

) 

#S(sub 

:NAME BINOM 
:PARAMS (X Y) 

:B0DY (IF (EQO X) 1 

(IF (== X Y) 1 (+ (BINOM (PRE X) (PRE Y) ) (BINOM (PRE X) Y))) 

) 

) 

#S(sub 

:NAME FIBD 
:PARAMS (X) 

:BDDY 

(IF (EQO X) 0 

(IF (EQO (PRE X)) 1 (+ (FIBD (PRE X)) (FIBQ (PRE (PRE X))))) 

) 

) 
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situation variable, 229 

- introduction, 229, 234 
skeleton, 188 

skill 

- acquistition, 272 

- cognitive, 272 

skill acquisition, 52, 124 
SME, 286, 294 
SOAR, 52 
Soar, 273 

software engineering, 99, 100, 311 
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- computer-aided, 100 

- knowledge-based, 100, 109, 123 
solution, 28 

sorting, 71, 88, 248, 271 

- bubble-sort, 163, 249 

- domain, 40, 61 

- program, 112 

- selection sort, 248, 249, 251 
source inclusiveness, 294 
source program, 312 
specification, 99-101, 119, 311 

- as theorem, 107 

- by examples, 104 

- complete, 103 

- formal, 103, 123 

- incomplete, 99, 104 

- language, 103 

- natural language, 102, 109 

- non-executable, 121 
specification language, 104 
stack, 30 

STAN, 43, 262 
state description 

- complete, 37, 56 

- consistent, 38 

- inconsistent, 37, 38 

- partial, 37, 56 
state invariant, 51 
state representation, 15 

- complete, 48 

- partial, 36, 48 
state space, 17, 33, 70 
state transformation, 15 
state transition, 46 
state variable, 73 
state-action rule, 45 
statics, 16, 17, 23, 72 
Strips, 24, 27, 56 

- domain, 17 

- Functional, 73 

- goal, 16 

- language, 14, 43 

- operator, 16 

- planner, 40 

- planning problem, 17 

- state, 15 

structural similarity, 292, 301, 305, 309 
structure mapping, 286, 292, 312 

- theory, 292 
sub- plan 

- uniform, 231 
sub-schema 

- equivalence, 203 
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- valid, 191 
sub-term, 169 
subprogram 

- body, 197 

- calculation steps, 216 

- calculation strategy, 217 

- dependency relation, 182 

- equality, 219 

- language, 175 
substitution, 16, 170, 177, 216 

- algorithm, 214 

- consistency, 212 

- inverse, 143 

- necessary condition, 211 

- product, 77 

- testing recurrence, 210 

- testing uniqueness, 209 

- uniqueness, 208 
substitution term, 177 

- calculation, 214 

- characteristics, 209 

- context, 321 
subsumption, 283 
succesor identification, 236 
Sussman anomaly, 34, 38 
symbolic model checking, 43, 46 
Synapse, 150 

synthesis problem, 185 
synthesis theorem, 155 

- basic, 157 

tactic, 49 

target exhaustiveness, 294 
target problem, 312 
term, 168 

- depth, 147 

- level, 147 

- order, 171 

- replacement, 170 

- substitution, 314, 320 

- subsumption, 316 
term algebra, 160 

term rewrite system, 170 
term rewriting, 171 
theorem prover, 25 

- Boyer-Moore, 50 
theorem proving, 37, 40, 49 

- constructive, 110, 113, 123, 164 

- drawbacks, 112 
Thesys, 151 
TIM, 51, 150 
Tinker, 163 
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Tower of Hanoi, 32, 62, 125, 126, 222, 
260, 267, 274, 295, 296 
trace, 160 

traces, 104, 108, 150, 152, 160, 163, 167 

- user-generated, 133 
training examples, 125 

- positive/negative, 141 
transformation 

- correctness, 119, 120 

- distance measure, 283 

- lateral, 115 

- plan to program, 227 

- vertical, 115, 311 
transformation rule, 107, 113, 119 
transformational semantics, 119 
tree 

- regular, 253 

- regularization, 253 
type inference, 51 

UMOP, 46 

unfolding, 116, 133, 157, 163, 180, 312, 
313 

- depth, 150 

- indices, 178 
unification 

- higher order, 314 
universal plan, 65 

- full, 65 

- optimal, 65 

- order over states, 65 
unpack, 152, 155 
unstack, 237 
update, 72, 75, 78 

- backward, 81 
user-modeling, 101 
utility problem, 52 

variable addition, 158 
variable instantiation, 174, 216 

- maximal pattern, 198 

- segment, 210 
variables 

- range, 76 

- sufficient instances, 211 

Warshall-algorithm, 120 
water jug domain, 85, 296 



